Page MenuHomePhabricator

Allow system header to provide their own implementation of some builtin
ClosedPublic

Authored by serge-sans-paille on Dec 5 2019, 11:09 AM.

Details

Summary

If a system header provides an (inline) implementation of some of their function, clang still matches on the function name and generate the appropriate llvm builtin, e.g. memcpy. This behavior is in line with glibc recommendation « users may not provide their own version of symbols » but doesn't account for the fact that glibc itself can provide inline version of some functions.

It is the case for the memcpy function when -D_FORTIFY_SOURCE=1 is on. In that case an inline version of memcpy calls __memcpy_chk, a function that performs extra runtime checks. Clang currently ignores the inline version and thus provides no runtime check.

This code fixes the issue by detecting functions whose name is a builtin name but also have a system-provided implementation.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2019, 11:09 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
serge-sans-paille marked an inline comment as done.

test case updated

Thanks for looking into this!

I dunno what implications this has elsewhere, but I'd like to note a few things about this general approach:

  • AIUI, this'll only work for FORTIFY functions which are literally calling __builtin___foo_chk; an equivalent implementation without such a builtin, like below, should be blocked by other code in clang, since it results in infinite recursion. glibc implements quite a few (most?) FORTIFY'ed functions in this way.
void *memcpy(void *a, void *b, size_t c) {
  if (__builtin_object_size(a, 0) == -1)
    return memcpy(a, b, c);
  return __memcpy_chk(a, b, c, __builtin_object_size(a, 0));
}
  • This change only gives us -D_FORTIFY_SOURCE=1-level checking here. If we want to offer better support for -D_FORTIFY_SOURCE=2 (including potentially FORTIFY's compile-time diagnostics), which is AIUI what most of the world uses, we'll need to add clang-specific support into glibc's headers. At that point, this patch becomes a nop, since the FORTIFY'ed memcpy becomes an overload for the built-in one.

In other words, I think this patch will improve our support for the pieces of glibc's FORTIFY that're implemented with __builtin___*_chk, but I don't think there's much more room to incrementally improve beyond that before we need to start hacking on glibc directly. If we get to that point, the changes we need to make in glibc will likely obviate the need for this patch.

So if your overarching goal here is to make clang support glibc's FORTIFY as-is a bit better, that's great and I'm personally cool with this patch (again, not being deeply aware of what interactions this may have in a non-FORTIFY'ed context). If the intent is to build off of this to try and make glibc's FORTIFY story with clang a great one, I'm unsure how useful this patch will be in the long run.

clang/lib/AST/Decl.cpp
3007

style nit: const FunctionDecl * is the preferred ordering in most of LLVM

3188

style nit: please don't use braces here or below, for consistency with nearby code.

It looks like this patch accidentally suppresses certain warnings when _FORTIFY_SOURCE is enabled. For example, Sema::CheckMemaccessArguments only gets called for recognized builtin functions.

Fix interaction with fortify warnings (cc @efriedma)

Also I've checked interactions with the test suite I wrote here https://github.com/serge-sans-paille/fortify-test-suite/ and clang now passes all the dynamic check, and only fails a few static checks, which is a great improvement. I still need to double check all the examples pointed out by @george.burgess.iv .

  • Be less intrusive in the getBuiltinID() system, just hook inside CGExpr.cpp
  • Take into account @george.burgess.iv remark on builtins referencing themselves in an override. Add a test case for that scenario.
clang/lib/AST/Decl.cpp
3006

Note to reviewers: I'm not super happy with that name.

Should we also have a quick test-case in Sema/ verifying that we still get the warnings that Eli mentioned?

clang/lib/AST/Decl.cpp
3006

Yeah, isReplaceableBuiltin() sounds like "can this be replaced at all?" rather than "is this acting as a replacement for something else?"

isReplacementForBuiltin() pops to mind, though in a lot of cases, these functions may end up calling the "actual" builtin (as you handle later in the patch), so maybe isProxyForBuiltin() is better?

3010

const FunctionDecl *

serge-sans-paille marked an inline comment as done.Dec 11 2019, 12:53 PM
serge-sans-paille added inline comments.
clang/lib/AST/Decl.cpp
3006

What about isBuiltInWrapper() ?

clang/lib/AST/Decl.cpp
3006

WFM

@george.burgess.iv : take into account reviews, extra testing and function renaming.

Improve test case portability.

Just a few more nits, and this LGTM. Thanks again!

IDK if Eli wanted to have another look at this; please wait until tomorrow before landing to give him time to comment.

clang/lib/AST/Decl.cpp
3006

nit: If we're going with this naming, I think it's important to highlight that we're querying if this is a definition. So isInlineBuiltinDefinition()?

clang/test/CodeGen/memcpy-nobuiltin.c
6

Generally, I think we try to keep diag checking and codegen checking separate, but this is simple enough that I don't think it's a big deal.

Please use clang's -verify + expected-warning instead of CHECKs here.

Looks like, among others, test/CodeGen/const-init.c combines -verify with FileCheck'ing codegen.

serge-sans-paille marked 6 inline comments as done.

Handle comments.

serge-sans-paille marked an inline comment as done.Dec 13 2019, 12:50 AM
efriedma added inline comments.Dec 13 2019, 11:15 AM
clang/lib/AST/Decl.cpp
3019

I'm a little concerned about this; we're subtly changing our generated code based on the "system-headerness" of the definition. We generally avoid depending on this for anything other than warnings. It could lead to weird results with preprocessed source, or if someone specifies their include paths incorrectly.

serge-sans-paille marked an inline comment as done.Dec 13 2019, 1:30 PM
serge-sans-paille added inline comments.
clang/lib/AST/Decl.cpp
3019

With the extra check on builtin status, it may no longer be necessary, let me check that.

serge-sans-paille marked an inline comment as done.

Take comment into account.

@efriedma validation passes without the checks you were pointed out, see https://github.com/serge-sans-paille/llvm-project/pull/4/checks
Thanks for making the code simpler!

clang/lib/AST/Decl.cpp
3019

@efriedma I confirm this test is no longer needed.

@serge-sans-paille any blocker for this to land? Would be great to have it in version 10. Thanks

@sylvestre.ledru : I'd like a confirmation from @efriedma first!

efriedma accepted this revision.Jan 9 2020, 2:58 PM

LGTM

This revision is now accepted and ready to land.Jan 9 2020, 2:58 PM
This revision was automatically updated to reflect the committed changes.

This caused a linker error in chromium:

ld.lld: error: undefined symbol: __warn_memset_zero_len

Apparently now that the glibc memset is being used, __warn_memset_zero_len gets called from libc++ code (https://github.com/llvm/llvm-project/blob/b72a8c65e4e34779b6bc9e466203f553f5294486/libcxx/include/__bit_reference#L371).

A repro:

$ cat t.cpp
#include <bitset>
void useit(std::bitset<3> &it);
std::bitset<3> getit();
int main() {
  std::bitset<3> bits = getit();
  bits.set();
  useit(bits);
}

$ clang -S -O2 t.cpp  -o - -stdlib=libc++ -D_FORTIFY_SOURCE=1 | grep warn_mem
        callq   __warn_memset_zero_len
rnk added a subscriber: rnk.Jan 15 2020, 3:29 PM

This caused a linker error in chromium:

ld.lld: error: undefined symbol: __warn_memset_zero_len

Apparently now that the glibc memset is being used, __warn_memset_zero_len gets called from libc++ code (https://github.com/llvm/llvm-project/blob/b72a8c65e4e34779b6bc9e466203f553f5294486/libcxx/include/__bit_reference#L371).

I can reproduce the behavior locally, but without the linker error. On my system, (RHEL7.5) libc_nonshared.a provides that symbol. I've checked on other systems and it's available in that library on Ubuntu 14.04.

On other systems, this symbol is not available, neither in the lib nor in the header, because the associated gcc version is high enough. Can you try to reproduce with:

clang -S -O2 -fgnuc-version=5.0 t.cpp  -o - -stdlib=libc++ -D_FORTIFY_SOURCE=1

@akhuang : I've reported https://sourceware.org/bugzilla/show_bug.cgi?id=25399 to glibc. According to me this is more of a glibc problem than this patch, and as we have a workaround with -fgnuc-version=5.0 I think we could apply the patch back.

(It's interesting to me if gcc doesn't warn about that libcxx code, since the whole point of the gnuc 5.0 check there was "the compiler should check this for us now"...)

If a glibc-side fix for this does materialize, IMO -fgnuc-version=5.0 isn't a good path forward.

It's best if workarounds are limited in scope, lifespan, and user impact:

  • Presumably -fgnuc-version=5.0 implies many more changes that are unrelated to this particular bug.
  • Glibc upgrades can take years to materialize for users[1].
  • A lot of packages build with -D_FORTIFY_SOURCE=2 by default. Requiring extra CPPFLAGS (either -D_FORTIFY_SOURCE=0 or -fgnuc-version=5.0) would break CC=clang make-style builds that already work today, if any statement in them folds to memset(x, non_zero, 0); after optimizations.

Looks like this landed on the branch, so unless I'm mistaken, we probably need to revert there, as well?

[1] -- Anecdotally, I upgraded a project's host toolchain from using glibc 2.15 to 2.17 last year. I wanted to do 2.19, but 2.17 usage was still large enough that 2.19 wasn't an option.

rnk added a comment.Jan 16 2020, 10:27 AM

I agree, raising -fgnuc-version is not an acceptable workaround, and we can't ship a clang that doesn't work with old versions of glibc. I think we need a different workaround. The simplest one I can think of is to make __warn_memset_zero_len a recognized builtin that generates no code.

Clang already emits this warning when it would be useful, before inlining. memset(ptr, c, n) when n is zero at runtime or at compile time after inlining should not be a warning or link error.

I believe we also branched 10.0 between this change and its revert. Can you please follow up and merge the eventual fix to the release?

In D71082#1824563, @rnk wrote:

The simplest one I can think of is to make __warn_memset_zero_len a recognized builtin that generates no code.

+1, simple and easy.

I believe we also branched 10.0 between this change and its revert. Can you please follow up and merge the eventual fix to the release?

Sure.

The simplest one I can think of is to make __warn_memset_zero_len a recognized builtin that generates no code.

+1, simple and easy.

See https://reviews.llvm.org/D72869

This caused a regression when using -fsanitize=memory -D_FORTIFY_SOURCE=2. MemorySanitizer requires all code, including system libraries, to be instrumented. Else it reports false positives. For glibc, it has lots of interceptors to make sure glibc doesn't have to be instrumented (as far as I understand). So people usually don't have an instrumented glibc in their sysroot. But after this change, memcpy calls aren't always intercepted. I filed https://bugs.llvm.org/show_bug.cgi?id=44779 with a standalone repro.

I'm not sure what the best fix for that is. Unless it's something obvious, we should probably revert this while we figure things out.

Until recently, both CrOS and Android disabled FORTIFY wholesale when any of asan/msan/tsan were active. Bionic recently got support for FORTIFY playing nicely with sanitizers, but that support boils down to "turn off all the FORTIFY runtime checks, but leave the ones that generate compiler diags on."

A quick fix here would likely be to disable this builtin interception functionality if a sanitizer that isn't ubsan/CFI is enabled. Since all of this code + the nop __warn builtin we added is specifically targeted at supporting glibc's FORTIFY patterns, I don't think adding a !sanitizers check would be too ugly on its own?

We believe this change (relanded as https://reviews.llvm.org/rGd437fba8ef626b6d8b7928540f630163a9b04021) is causing a mis-compile in Linux kernel 4.4 builds resulting in local test failures.
Chrome OS bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1066638

I will try to see if I can reduce it further but will it be ok to revert this change meanwhile?

I'd prefer not to revert a change that's three months old without some sort of evidence the issue is actually the compiler's fault. If nobody else has seen an issue, it's probably okay to let it sit for a day or two.

Sure, I am trying to root cause the issue. Will report back hopefully soon.

I was able to reduce to following:

typedef unsigned int u32;
typedef unsigned long long u64;
typedef unsigned long size_t;

void fortify_panic(const char *name) __attribute__((noreturn)) ;
void __read_overflow(void) ;
void __read_overflow2(void) ;
void __write_overflow(void) ;

extern void *memcpy(void *to, const void *from, size_t len);
extern void *__memcpy(void *to, const void *from, size_t len);

extern inline __attribute__((unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) __attribute__((gnu_inline)) void *memcpy(void *p, const void *q, size_t size)
{
 size_t p_size = __builtin_object_size(p, 0);
 size_t q_size = __builtin_object_size(q, 0);
 if (__builtin_constant_p(size)) {
  if (p_size < size)
   __write_overflow();
  if (q_size < size)
   __read_overflow2();
 }
 if (p_size < size || q_size < size)
  fortify_panic(__func__);
 return __builtin_memcpy(p, q, size);
}

static inline __attribute__((unused)) __attribute__((no_instrument_function)) void
memcpy_fromio(void *dst, const volatile void *src, size_t count)
{
 memcpy(dst, (const void *)src, count);
}

u64 sst_shim32_read64(void *addr, u32 offset)
{
 u64 val;
 memcpy_fromio(&val, addr + offset, sizeof(val));
 return val;
}

Compiling with

clang -Qunused-arguments -D_FORTIFY_SOURCE=2 -fno-omit-frame-pointer -fno-stack-protector  -nostdinc  -fno-strict-aliasing -fno-common  -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time  -pipe  -mretpoline-external-thunk -fno-delete-null-pointer-checks  -Os -fstack-protector-strong -mno-global-merge -no-integrated-as  -fno-omit-frame-pointer -fno-optimize-sibling-calls -ftrivial-auto-var-init=pattern  -pg -mfentry  -fno-strict-overflow -fno-merge-all-constants -fno-stack-check  -c -o test.o test.c -target x86_64-cros-linux-gnu

and objdump -drW test.o
before:
No memcpy is emitted as clang is able to optimize it.

0000000000000000 <sst_shim32_read64>:
   0:	e8 00 00 00 00       	callq  5 <sst_shim32_read64+0x5>
			1: R_X86_64_PLT32	__fentry__-0x4
   5:	55                   	push   %rbp
   6:	48 89 e5             	mov    %rsp,%rbp
   9:	89 f0                	mov    %esi,%eax
   b:	48 8b 04 07          	mov    (%rdi,%rax,1),%rax
   f:	5d                   	pop    %rbp
  10:	c3                   	retq

After:

A call to memcpy is emitted.

0000000000000000 <sst_shim32_read64>:
   0:	e8 00 00 00 00       	callq  5 <sst_shim32_read64+0x5>
			1: R_X86_64_PLT32	__fentry__-0x4
   5:	55                   	push   %rbp
   6:	48 89 e5             	mov    %rsp,%rbp
   9:	53                   	push   %rbx
   a:	48 83 ec 10          	sub    $0x10,%rsp
   e:	65 48 8b 04 25 28 00 	mov    %gs:0x28,%rax
  15:	00 00 
  17:	48 89 45 f0          	mov    %rax,-0x10(%rbp)
  1b:	48 b8 aa aa aa aa aa 	movabs $0xaaaaaaaaaaaaaaaa,%rax
  22:	aa aa aa 
  25:	48 8d 5d e8          	lea    -0x18(%rbp),%rbx
  29:	48 89 03             	mov    %rax,(%rbx)
  2c:	89 f6                	mov    %esi,%esi
  2e:	48 01 fe             	add    %rdi,%rsi
  31:	ba 08 00 00 00       	mov    $0x8,%edx
  36:	48 89 df             	mov    %rbx,%rdi
  39:	e8 00 00 00 00       	callq  3e <sst_shim32_read64+0x3e>
			3a: R_X86_64_PLT32	memcpy-0x4
  3e:	48 8b 03             	mov    (%rbx),%rax
  41:	65 48 8b 0c 25 28 00 	mov    %gs:0x28,%rcx
  48:	00 00 
  4a:	48 3b 4d f0          	cmp    -0x10(%rbp),%rcx
  4e:	75 07                	jne    57 <sst_shim32_read64+0x57>
  50:	48 83 c4 10          	add    $0x10,%rsp
  54:	5b                   	pop    %rbx
  55:	5d                   	pop    %rbp
  56:	c3                   	retq   
  57:	e8 00 00 00 00       	callq  5c <sst_shim32_read64+0x5c>
			58: R_X86_64_PLT32	__stack_chk_fail-0x4

At this point, it is not clear to me if clang is doing anything wrong here is or this a bug in kernel 4.4 that it is using a regular memcpy for IO which cannot use regular memcpy.
And so I suspect we need to backport https://github.com/torvalds/linux/commit/c2327da06b33d8e1093ce2c28f395bc500d1b0d3 to older kernel versions.

@nickdesaulniers wdyt?

Unfortunately, cherry-picking the kernel patches didn't work including latest memcpy for x86 (https://github.com/torvalds/linux/commit/170d13ca3a2fdaaa0283399247631b76b441cca2 and https://github.com/torvalds/linux/commit/c228d294f2040c3a5f5965ff04d4947d0bf6e7da ).
Also tried ToT Linux kernel but still the same problem.

So far, it is not clear to me whether clang is at fault or Linux kernel has incorrect assumptions about memcpy. Also note that memcpy_fromio has the argument type as volatile void* which I believe is to inhibit some compiler optimizations.

@efriedma do you have any suggestions?

serge-sans-paille added a comment.EditedMar 31 2020, 11:27 PM

@manojgupta prior to this patch, clang was using its version of memcpy and not the one provided inline. I slightly modified your reproducer to showcase that aspect: https://godbolt.org/z/NmxC0v

rnk added a comment.Apr 1 2020, 9:58 AM

It's unsurprising that the Linux kernel's own Fortify implementation is incompatible with Clang. The whole feature should never have been implemented in the library to begin with, but here we are. I think the Linux kernel folks would prefer it if we can fix forward and find a way to make their Fortify implementation work, though.

For fixing forward, the question is, why did LLVM emit a standalone call to mempcy? Did LLVM inline the thing marked always_inline? Or did __builtin_memcpy compile down to call i8* @memcpy(...) instead of call void @llvm.memcpy.*?

Yes, it'd be nice if all of the FORTIFY handling can be improved. For a simple call like memcpy of 8 bytes in the example, there is no reason to emit all these stack/range checks since they'd degrade memcpy performance.

I still think this change should be reverted if it can't handle Linux kernel's FORTIFY implementation.

Yes, it'd be nice if all of the FORTIFY handling can be improved. For a simple call like memcpy of 8 bytes in the example, there is no reason to emit all these stack/range checks since they'd degrade memcpy performance.

I still think this change should be reverted if it can't handle Linux kernel's FORTIFY implementation.

I think there's a misunderstanding there. This patch has nothing to do with clang being unable to correctly handle the kernel's memcpy implementation. The only thing it does is actually *picking* the kernel's memcpy implementation instead of relying on the builtin, non-fortified, version of it.

For a more direct comparison, I offer https://godbolt.org/z/fqAhUC . The lack of optimization in the later case is because we're forced to mark the call to __builtin_memcpy in the inline memcpy as nobuiltin. If we instead rename things, this issue doesn't happen: https://godbolt.org/z/FKNTWo.

All other FORTIFY impls I know of defer to _chk functions for checking that's not guarded by __builtin_constant_p, rather than having size checks inline like the kernel. Both GCC and Clang have special knowledge of these intrinsics, so they can optimize them well: https://godbolt.org/z/L7rVHp . It'd be nice if the kernel's FORTIFY were more like all of the other existing ones in this way.

Deferring to _chk builtins has the side-benefit that the inline memcpy is often smaller, which increases the inlineability of any functions that memcpy gets inlined into(*). The down-side is that the kernel now needs to carry definitions for __memcpy_chk et al.

(*) -- not in an underhanded way. It's just that any condition that depends on __builtin_constant_p or __builtin_object_size(obj) is guaranteed to be folded away at compile-time; not representing them in IR is more "honest" to anything that's trying to determine the inlinability of a function.

It'd be nice if the kernel's FORTIFY were more like all of the other existing ones in this way.

Please file a bug with some more details, and let's make it happen.

I opened https://github.com/ClangBuiltLinux/linux/issues/979 to see if we can fix this in Linux kernel.