Basically, inline builtin definition are shadowed by externally visible
redefinition. This matches GCC behavior.
Details
Diff Detail
Event Timeline
Yes; GCC does behave this way. It does not consider a non-gnu-inline redefinition an error, and it does seem to prefer the non-gnu-inline redeclaration when both are present, AFAICT. The test is verifying that behavior correctly. This patch is fixing the test case, the reported reduced cases from @manojgupta link, @nathanchance link, and myself link, and fixing the kernel builds link.
Further, I did a build+boot tests of:
- mainline x86_64 + CONFIG_FORTIFY_SOURCE=y
- mainline x86_64 + CONFIG_FORTIFY_SOURCE=y + CONFIG_LTO_CLANG_FULL=y
- mainline x86_64 + CONFIG_FORTIFY_SOURCE=y + CONFIG_LTO_CLANG_THIN=y
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1318 | do we really want to be iterating every redaclaration like this, even for non-inline builtin declarations? Is there a way to avoid the below loop for most functions? Perhaps we should be doing this when we hit a redeclaration instead? Or wrap all this work in a check that there's a builtin ID associated with the FD? | |
1328 | If there are multiple redeclarations, do we want to be erasing the clone each time, or can we break out of this loop? | |
clang/test/CodeGen/strlen-inline-builtin-redecl.c | ||
11–13 | Has this example been formatted? Does rotating the attributes to the front help? | |
43 | I think this line can be dropped without affecting the test. Or consider using any of the further reduced test cases? |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | I don't think we want to do all this work if just Fn; ie. create a new std::string with .inline suffix for every function we're going to generate code (IR) for. How about we add an additional unlikely guard: if (FD->getBuiltinID() && FN) { Because in the usual case, FD both has a builtin ID and is an inline builtin declaration, while in the exceptional case that this patch addresses, FD has a builtin ID but is not an inline builtin declaration. | |
1315–1317 | Perhaps in Sema::CheckFunctionDeclaration? I see there is where we detect redeclarations. The calls from there to FunctionDecl::setPreviousDeclaration() seem to set up the redecl chain. Perhaps this exceptional case (or both cases, even) would be handled better there? cc @rsmith @aaron.ballman in case they have feedback/tips/cycles. | |
clang/test/CodeGen/strlen-inline-builtin-redecl.c | ||
9 | unused decl | |
11 | Do you mind wrapping this to 80 chars wide? I suspect if you put the two __attribute__s first, then the formatter will do a better job. You can also combine these, a la __attribute__((always_inline, gnu_inline)) to cut down on line length. |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | Is it correct to gate this on whether it's a builtin or not? I thought that builtin-like (e.g., the usual pile of attributes) user code should also have the same effect, shouldn't it? | |
1315–1317 | I don't know that it's a good idea to modify the redeclaration chain in this case. The comments on the chain are pretty clear that it's a temporal chain where "previous" means previously declared in relation to the current declaration. @rsmith may feel differently, however. |
Bumping for an update here. We can tolerate a build breakage for our older kernels over the weekend, but we should really try to get this resolved by EOW, otherwise we need to look into reverting:
- 3d6f49a56995b845c40be5827ded5d1e3f692cec Tue Sep 28 13:24:25 2021 +0200 (breakage)
- bd379915de38a9af3d65e19075a6a64ebbb8d6db Tue Sep 28 16:07:33 2021 +0200 (attempted fix forward of 3d6f49a56995b845)
- 0d76d4833dd2815e0b1c786250f474d222f6a0a1 Tue Sep 28 11:30:37 2021 -0700 (revert of 3d6f49a56995b845)
- c3717b6858d32d64514a187ede1a77be8ba4e542 Tue Sep 28 21:00:47 2021 +0200 (reland, introduced kernel breakage)
- 0f0e31cf511def3e92244e615b2646c1fd0df0cd Mon Oct 4 22:26:25 2021 +0200 (fix forward)
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | What do you mean? I'm sorry, I don't quite follow. | |
1315–1317 | Sorry, I don't quite follow whether your approving of the current approach or dismissive? |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | From the test cases below: extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) unsigned long strlen(const char *p) { return 1; } unsigned long mystrlen(char const *s) { return strlen(s); } unsigned long strlen(const char *s) { return 2; } These redeclarations resolve a particular way by GCC and this patch is intending to match that behavior. My question ultimately boils down to whether that also happens for this code, where the function is not a builtin but looks like one due to the attributes: extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) unsigned long wahoo(const char *p) { return 1; } unsigned long mywahoo(char const *s) { return wahoo(s); } unsigned long wahoo(const char *s) { return 2; } If this also reorders, then I don't think we can look at whether FD->getBuiltinID() != 0 to decide whether to do the reordering dance because arbitrary user functions aren't Clang builtins and so they'd not get the correct behavior. Does that make sense? | |
1315–1317 | I don't think we should modify the redecl chain from CheckFunctionDeclaration() -- this case would create a redeclaration chain whose previous link was not temporally the previous declaration. There might be another approach so we can avoid replaceAllUsesWith(). One possibility (no idea how feasible or what explodes) is to modify FunctionDecl::getDefinition() to look through the chain to return the best definition when there are multiple definitions to pick from. |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 |
It does; https://godbolt.org/z/bbrox7f6e.
Yes, thanks for the additional info. In this case, I guess we can disregard my feedback that started this thread, marking it as done? Perhaps @serge-sans-paille should add such a non-builtin test case as part of the change? |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | I think you have a valid concern about the extra allocations, but I'm not certain of a better predicate to use. My intuition is that the overhead here won't be unacceptable, but it'd be good to know we're not regressing compile time performance significantly. Additional test coverage with a comment as to why we're testing it is a good idea! |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | Perhaps we can first test whether this FuctionDecl is a redecl, then do the allocation, then check if the .inline suffix? That way we avoid creating the new string unless we're codgen'ing a redecl, which should be uncommon in practice. |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | That could save us a bit of perf, but I think redecls are actually quite common because a definition is itself a declaration, so having a decl in a header file and defn in a source file will produce a redecl chain. |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1301–1302 | Ah, ok then.
Yeah, with that and fixes to the other small nits in the test case, then I think this is ready to land. |
- Add a test case to ensure we keep the right behavior for non-intrinsic gnu inline
- walk the redecl chain before doing an extra string alloc
Actually, it looks like:
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp index 69d2ef631872..8e77cdef2ed5 100644 --- a/clang/lib/Sema/SemaDecl.cpp +++ b/clang/lib/Sema/SemaDecl.cpp @@ -10927,6 +10927,10 @@ bool Sema::CheckFunctionDeclaration(Scope *S, FunctionDecl *NewFD, } if (Redeclaration) { + if (cast<FunctionDecl>(OldDecl)->isInlineBuiltinDeclaration() && !NewFD->isInlineBuiltinDeclaration()) { + // Set a flag on NewFD that it's a shadowed gnu_inline that should be + // emitted, or on OldDecl that it should not be emitted? + } // NewFD and OldDecl represent declarations that need to be // merged. if (MergeFunctionDecl(NewFD, OldDecl, S, MergeTypeWithPrevious)) {
might detect what you need. Can we perhaps use those checks there (when we have already detected a redeclaration) to perhaps set new members (to be added) on FunctionDecl that they shouldn't be emitted or not because they are this weird case? Then CodeGenFunction::GenerateCode can simply check that flag, then erase the existing function (or not generate it in the first place by setting a flag on OldDecl perhaps?). No redecl walking for every FunctionDecl required.
clang/test/CodeGen/user-func-gnu-inline-redecl.c | ||
---|---|---|
20 | this test passes before this patch is applied; I wonder if we have existing coverage in tree for this case? Surprisingly, I don't think we do. Perhaps gnu-inline-redecl.c might be a more concise test name? I can't help but shake the feeling that the builtin id stuff is a degenerate case of how GCC treats redeclarations of extern inline (gnu_inline) functions and that perhaps by solving just that, fixing the case of builtins might just "fall out" from that. |
clang/test/CodeGen/user-func-gnu-inline-redecl.c | ||
---|---|---|
20 | Clang indeed naturally handles gnu_inline in a decent way. The problem we're trying to solve now is a side effect of the premature renaming of function call site when we think it's a direct call to inline builtin. I've updated the implementation to avoid walking redecls. |
The downside to this approach is that we can now handle less ctor initializers because we need to steal a bit from there. We're going from allowing 2,097,152 ctor inits to 1,048,576, which is still a pretty large number of ctor inits, so I think this new approach is defensible. However, this does make it that much harder to add any new bits in the future. I agree that walking the redecl chain could potentially cause a performance issue, but that's speculative without some measurements. Because we don't have those measurements, I'm actually a bit more comfortable with walking the redecl chain than the current approach -- if we measure performance and find that walking this chain is a bottleneck, then it makes it more obvious that the new approach is worthwhile.
I second @aaron.ballman there. I compiled the sqlite3.c amalgamation, -O0, with both approach, measuring the number of instructions as gathered by valgrind --tool=callgrind
- when walking redecls: 9001630039 instructions, I changed the implementation a bit down to 9001628850
- when storing redecl state: 9000816370 instructions
Here's a [hastily and poorly written] script to measure the average cycle counts for 30 invocations using linux perf: https://gist.github.com/nickdesaulniers/4a20ba10c26ac2ad02cb0425b8b0f826
For Diff 382671 (latest; storing redecl state), builds of the linux kernel x86_64 defconfig+CONFIG_FORTIFY_SOURCE=y:
$ /tmp/measure_30.sh 'make LLVM=1 -j72' 'make LLVM=1 -j72 clean' ... Average of 30 runs: 10780685107280.83 cycles
For Diff 382344 (earlier; walking redecl chain), builds of the linux kernel x86_64 defconfig+CONFIG_FORTIFY_SOURCE=y:
$ /tmp/measure_30.sh 'make LLVM=1 -j72' 'make LLVM=1 -j72 clean' ... Average of 30 runs: 10745227016663.00 cycles
Damn, so what I proposed was slower, at least for the major case that I care about. I suspect that perhaps there's more forward declarations than actual functions we end up generating IR for, perhaps...either way, I'm sorry for suggesting the "storing redecl state" approach. If we want to go back to the earlier version in Diff 382344, at this point, I'd be happy to accept that revision. Sorry for causing whiplash on this.
Thank you for measuring this as well! And no worries on the whiplash (easy for me to say as a reviewer, hah) -- I think it was a reasonable thought to explore and measure the performance of. FWIW, I'd be happy accepting the earlier revision as well.
Re-uploading previous version that walks redef, with a slight change in the walking algorithm.
LGTM aside from a formatting nit. I don't think the precommit CI failures are related to your patch from what I was seeing, but may be worth keeping an eye on once you land just in case.
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
1323 | Please fix the formatting. |
I don't think we want to do all this work if just Fn; ie. create a new std::string with .inline suffix for every function we're going to generate code (IR) for. How about we add an additional unlikely guard:
if (FD->getBuiltinID() && FN) {
Because in the usual case, FD both has a builtin ID and is an inline builtin declaration, while in the exceptional case that this patch addresses, FD has a builtin ID but is not an inline builtin declaration.