Page MenuHomePhabricator

[C] Default implicit function pointer conversions diagnostic to be an error
ClosedPublic

Authored by aaron.ballman on Aug 7 2022, 6:12 AM.

Details

Summary

Implicitly converting between incompatible function pointers in C is currently a default-on warning (it is an error in C++). However, this is very poor security posture. A mismatch in parameters or return types, or a mismatch in calling conventions, etc can lead to exploitable security vulnerabilities. Rather than allow this unsafe practice with a warning, this patch strengthens the warning to be an error (while still allowing users the ability to disable the error or the warning entirely to ease migration). Users should either ensure the signatures are correctly compatible or they should use an explicit cast if they believe that's more reasonable.

Diff Detail

Event Timeline

aaron.ballman created this revision.Aug 7 2022, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2022, 6:12 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
aaron.ballman requested review of this revision.Aug 7 2022, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2022, 6:12 AM
cor3ntin accepted this revision.Aug 7 2022, 7:04 AM
cor3ntin added a subscriber: cor3ntin.

LGTM.
I agree that this address potential security issues. I'm curious how many people will be affected, but hopefully by doing that early in the clang 16 cycle we might find out!
Thanks for working on this Aarron.

clang/include/clang/Basic/DiagnosticSemaKinds.td
8178

Fancy!

This revision is now accepted and ready to land.Aug 7 2022, 7:04 AM
erichkeane accepted this revision.Aug 8 2022, 6:15 AM

Yep, i think this is a good direction, and code changes look correct to me.

Rebased and fixing up clang-tools-extra tests caught by the precommit CI.

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2022, 3:52 AM
nikic added a subscriber: nikic.Aug 10 2022, 1:37 PM

This seems to break the test-suite build:

FAILED: CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o 
/root/llvm-compile-time-tracker/llvm-test-suite-build/tools/timeit --summary CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o.time /root/llvm-compile-time-tracker/llvm-project-build/bin/clang -DNDEBUG  -fexperimental-new-pass-manager -O0 -g   -w -Werror=date-time -DHAVE_CONFIG_H -I/root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV -I/root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV/zlib -DDONT_LOCK_DBDIRS -DC_LINUX -DFPU_WORDS_BIGENDIAN=0 -DWORDS_BIGENDIAN=0 -MD -MT CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o -MF CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o.d -o CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o   -c /root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV/libclamav_readdb.c
/root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV/libclamav_readdb.c:1145:49: error: incompatible function pointer types passing 'int (const struct dirent *, const struct dirent *)' to parameter of type '__compar_fn_t' (aka 'int (*)(const void *, const void *)') [-Wincompatible-function-pointer-types]
    qsort(dents, ndents, sizeof(struct dirent), dirent_compare);
                                                ^~~~~~~~~~~~~~
/usr/include/stdlib.h:831:20: note: passing argument to parameter '__compar' here
                   __compar_fn_t __compar) __nonnull ((1, 4));
                                 ^
1 error generated.

Is it intentional that this generates an error for benign differences (in pointer types only)?

This seems to break the test-suite build:

FAILED: CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o 
/root/llvm-compile-time-tracker/llvm-test-suite-build/tools/timeit --summary CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o.time /root/llvm-compile-time-tracker/llvm-project-build/bin/clang -DNDEBUG  -fexperimental-new-pass-manager -O0 -g   -w -Werror=date-time -DHAVE_CONFIG_H -I/root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV -I/root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV/zlib -DDONT_LOCK_DBDIRS -DC_LINUX -DFPU_WORDS_BIGENDIAN=0 -DWORDS_BIGENDIAN=0 -MD -MT CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o -MF CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o.d -o CTMark/ClamAV/CMakeFiles/clamscan.dir/libclamav_readdb.c.o   -c /root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV/libclamav_readdb.c
/root/llvm-compile-time-tracker/llvm-test-suite/CTMark/ClamAV/libclamav_readdb.c:1145:49: error: incompatible function pointer types passing 'int (const struct dirent *, const struct dirent *)' to parameter of type '__compar_fn_t' (aka 'int (*)(const void *, const void *)') [-Wincompatible-function-pointer-types]
    qsort(dents, ndents, sizeof(struct dirent), dirent_compare);
                                                ^~~~~~~~~~~~~~
/usr/include/stdlib.h:831:20: note: passing argument to parameter '__compar' here
                   __compar_fn_t __compar) __nonnull ((1, 4));
                                 ^
1 error generated.

Thanks, I'll try to take care of this shortly!

Is it intentional that this generates an error for benign differences (in pointer types only)?

Yes, it's still UB.

For the record, so far we've seen this showing up in the following:

  • A case in a notoriously warning-heavy third-party library where we'd backported a file from a newer version and didn't quite fix all the internal API mismatches.
  • A ten-year-old bug in a local patch to another third-party library, where a function-pointer parameter was defined as returning a void and then assigned to a function-pointer that returns a void *.
  • A probably-innocuous bug in a local patch to yet another third-party library, where we were passing an int foo(char *, char *) function to GLIBC's qsort, which expects a function with a signature of int foo(void *, void *).
  • A case where https://gitlab.freedesktop.org/pixman/pixman/-/commit/e0d4403e78a7af8f4be4110ae25e9c3d1ac93a78 wasn't applied to our local version. This is also probably an innocuous case, not a "real" bug.
  • A case where SciPy's extension has a function that uses void * for a FILE * pointer (https://github.com/scipy/scipy/blob/main/scipy/spatial/_qhull.pyx#L187, second argument) while the corresponding C code's function has a real FILE * pointer (https://github.com/qhull/qhull/blob/master/src/libqhull_r/io_r.h#L97). The SciPy function also uses a void * for an argument of a struct type, which seems rather odd to me given that it just defined the type two lines earlier.

So, real bugs in 40% of cases, I'd say.

MaskRay added a subscriber: MaskRay.EditedAug 10 2022, 10:23 PM

Nit: I think it is useful mentioning -Wincompatible-pointer-types-Wincompatible-function-pointer-types in the commit message, not just in clang/docs/ReleaseNotes.rst, so that git log --grep incompatible-function-pointer-types will reveal something when the user knows incompatible-function-pointer-types but not "implicit function pointer conversions diagnostic".

For the record, so far we've seen this showing up in the following:

  • A case in a notoriously warning-heavy third-party library where we'd backported a file from a newer version and didn't quite fix all the internal API mismatches.
  • A ten-year-old bug in a local patch to another third-party library, where a function-pointer parameter was defined as returning a void and then assigned to a function-pointer that returns a void *.
  • A probably-innocuous bug in a local patch to yet another third-party library, where we were passing an int foo(char *, char *) function to GLIBC's qsort, which expects a function with a signature of int foo(void *, void *).
  • A case where https://gitlab.freedesktop.org/pixman/pixman/-/commit/e0d4403e78a7af8f4be4110ae25e9c3d1ac93a78 wasn't applied to our local version. This is also probably an innocuous case, not a "real" bug.
  • A case where SciPy's extension has a function that uses void * for a FILE * pointer (https://github.com/scipy/scipy/blob/main/scipy/spatial/_qhull.pyx#L187, second argument) while the corresponding C code's function has a real FILE * pointer (https://github.com/qhull/qhull/blob/master/src/libqhull_r/io_r.h#L97). The SciPy function also uses a void * for an argument of a struct type, which seems rather odd to me given that it just defined the type two lines earlier.

So, real bugs in 40% of cases, I'd say.

I double-checked these scenarios against the standard (where possible) and all of them are undefined behavior per the spec (and would be things I would expect a type sanitizer to report on), but I agree that not all of them seem likely to lead to a security vulnerability.

Nit: I think it is useful mentioning -Wincompatible-pointer-types-Wincompatible-function-pointer-types in the commit message, not just in clang/docs/ReleaseNotes.rst, so that git log --grep incompatible-function-pointer-types will reveal something when the user knows incompatible-function-pointer-types but not "implicit function pointer conversions diagnostic".

Good point, thanks!

I found another case of this warning, which is kinda borderline whether it really is an error not:

#include <stdlib.h>
static _Noreturn void my_exit(void) {
  exit(42);
}
__attribute__((noreturn)) void (*handler)(void) = my_exit;

The fix is simple though, just be consistent with _Noreturn vs __attribute__((noreturn)) on both function and function pointer.

I found another case of this warning, which is kinda borderline whether it really is an error not:

#include <stdlib.h>
static _Noreturn void my_exit(void) {
  exit(42);
}
__attribute__((noreturn)) void (*handler)(void) = my_exit;

The fix is simple though, just be consistent with _Noreturn vs __attribute__((noreturn)) on both function and function pointer.

Oh wow, that one is neat, weird, and I'm not certain what I think about it yet. __attribute__((noreturn)) is part of the function type, which is why you're allowed to write it on a function pointer declaration. _Noreturn is not part of the function type (nor is [[noreturn]] in C2x), so the two functions types do not match and that's why you get the diagnostic. This is the same logic that allows a diagnostic for a case like:

__attribute__((stdcall)) void func(int a, int b);
void (*fp)(int, int) = func; // Oops, calling convention mismatch

(on targets where these calling conventions exist and stdcall is not the default calling convention).

However, we don't care about the type mismatch when doing a redeclaration or when dropping the attribute, as in:

__attribute__((noreturn)) void my_exit(void);
void (*handler)(void) = my_exit; // Silently drops the attribute

_Noreturn void my_exit(void) { // No concerns about the type mismatch on redeclaration because we inherit __attribute__((noreturn))
}

so maybe we shouldn't worry about it here.

However, we don't care about the type mismatch when doing a redeclaration or when dropping the attribute, as in:

__attribute__((noreturn)) void my_exit(void);
void (*handler)(void) = my_exit; // Silently drops the attribute

_Noreturn void my_exit(void) { // No concerns about the type mismatch on redeclaration because we inherit __attribute__((noreturn))
}

so maybe we shouldn't worry about it here.

Are you sure? If the function types don't match in C++, I would expect that to become a different overload, not a redeclaration, right?

I found another case of this warning, which is kinda borderline whether it really is an error not:

#include <stdlib.h>
static _Noreturn void my_exit(void) {
  exit(42);
}
__attribute__((noreturn)) void (*handler)(void) = my_exit;

The fix is simple though, just be consistent with _Noreturn vs __attribute__((noreturn)) on both function and function pointer.

Oh wow, that one is neat, weird, and I'm not certain what I think about it yet.

FWIW, the source of the code here is a piece of code from glibc, distributed as part of gnulib and in gettext (in various older/newer copies of it). The bug has itself been fixed in gnulib already almost 2 years ago, as result of the Clang warning at the time: https://git.savannah.gnu.org/cgit/gnulib.git/commit/lib/obstack.c?id=0cc39712803ade7b2d4b89c36b143dad72404063 However in this case, a (potentially old) packaged tarball of gettext contained two older copies of the file.

A different but slightly similar case, where I've run into these errors too, FWIW: On ARM, when compiling with a hardfloat target, Clang does consider a function pointer with __attribute__((pcs("aapcs-vfp"))) different from one without, even if the default still in the end would end up equal to it. I guess the problem there is that Clang doesn't know what the default implicit calling convention is (and that's something that ends up resolved on the LLVM side while lowering code) - while e.g. Clang does know that __cdecl is equivalent to not declaring any calling convention at all.

A different but slightly similar case, where I've run into these errors too, FWIW: On ARM, when compiling with a hardfloat target, Clang does consider a function pointer with __attribute__((pcs("aapcs-vfp"))) different from one without, even if the default still in the end would end up equal to it. I guess the problem there is that Clang doesn't know what the default implicit calling convention is (and that's something that ends up resolved on the LLVM side while lowering code) - while e.g. Clang does know that __cdecl is equivalent to not declaring any calling convention at all.

I think that one is a bug with the target architecture not properly specifying what the default calling convention is that case. My guess is they're falling back on cdecl as the default calling convention in the frontend when they should be looking at the target.

However, we don't care about the type mismatch when doing a redeclaration or when dropping the attribute, as in:

__attribute__((noreturn)) void my_exit(void);
void (*handler)(void) = my_exit; // Silently drops the attribute

_Noreturn void my_exit(void) { // No concerns about the type mismatch on redeclaration because we inherit __attribute__((noreturn))
}

so maybe we shouldn't worry about it here.

Are you sure? If the function types don't match in C++, I would expect that to become a different overload, not a redeclaration, right?

The changes here only impact C so it shouldn't be a big deal, but we do not create an overload set for them in C with the overloadable attribute: https://godbolt.org/z/1ETe6ff5s

I tracked down the noreturn behavior and this turns out to be intentional for the past ten years and is a result of a bug report I can't see: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaOverload.cpp#L1526 (https://github.com/llvm/llvm-project/commit/48c69106e4ce1 was what added the functionality.)