This is an archive of the discontinued LLVM Phabricator instance.

Fix inline builtin handling in case of redefinition
ClosedPublic

Authored by serge-sans-paille on Oct 19 2021, 2:33 AM.

Details

Summary

Basically, inline builtin definition are shadowed by externally visible
redefinition. This matches GCC behavior.

Diff Detail

Event Timeline

serge-sans-paille requested review of this revision.Oct 19 2021, 2:33 AM
serge-sans-paille created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2021, 2:33 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

thanks, I can verify that it fixes the crash we were seeing.

Yes; GCC does behave this way. It does not consider a non-gnu-inline redefinition an error, and it does seem to prefer the non-gnu-inline redeclaration when both are present, AFAICT. The test is verifying that behavior correctly. This patch is fixing the test case, the reported reduced cases from @manojgupta link, @nathanchance link, and myself link, and fixing the kernel builds link.

Further, I did a build+boot tests of:

  1. mainline x86_64 + CONFIG_FORTIFY_SOURCE=y
  2. mainline x86_64 + CONFIG_FORTIFY_SOURCE=y + CONFIG_LTO_CLANG_FULL=y
  3. mainline x86_64 + CONFIG_FORTIFY_SOURCE=y + CONFIG_LTO_CLANG_THIN=y
clang/lib/CodeGen/CodeGenFunction.cpp
1318

do we really want to be iterating every redaclaration like this, even for non-inline builtin declarations? Is there a way to avoid the below loop for most functions? Perhaps we should be doing this when we hit a redeclaration instead? Or wrap all this work in a check that there's a builtin ID associated with the FD?

1328

If there are multiple redeclarations, do we want to be erasing the clone each time, or can we break out of this loop?

clang/test/CodeGen/strlen-inline-builtin-redecl.c
11–13

Has this example been formatted? Does rotating the attributes to the front help?

43

I think this line can be dropped without affecting the test. Or consider using any of the further reduced test cases?

Reduce the number of time we would walk redecls.
Simplify test case

Avoid walking redecls

serge-sans-paille marked 4 inline comments as done.Oct 20 2021, 10:31 AM
ychen added a subscriber: ychen.Oct 20 2021, 10:54 AM
nickdesaulniers added inline comments.
clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

I don't think we want to do all this work if just Fn; ie. create a new std::string with .inline suffix for every function we're going to generate code (IR) for. How about we add an additional unlikely guard:

if (FD->getBuiltinID() && FN) {

Because in the usual case, FD both has a builtin ID and is an inline builtin declaration, while in the exceptional case that this patch addresses, FD has a builtin ID but is not an inline builtin declaration.

1315–1317

Perhaps in Sema::CheckFunctionDeclaration? I see there is where we detect redeclarations. The calls from there to FunctionDecl::setPreviousDeclaration() seem to set up the redecl chain. Perhaps this exceptional case (or both cases, even) would be handled better there?

cc @rsmith @aaron.ballman in case they have feedback/tips/cycles.

clang/test/CodeGen/strlen-inline-builtin-redecl.c
9

unused decl

11

Do you mind wrapping this to 80 chars wide? I suspect if you put the two __attribute__s first, then the formatter will do a better job. You can also combine these, a la __attribute__((always_inline, gnu_inline)) to cut down on line length.

aaron.ballman added inline comments.Oct 21 2021, 5:11 AM
clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

Is it correct to gate this on whether it's a builtin or not? I thought that builtin-like (e.g., the usual pile of attributes) user code should also have the same effect, shouldn't it?

1315–1317

I don't know that it's a good idea to modify the redeclaration chain in this case. The comments on the chain are pretty clear that it's a temporal chain where "previous" means previously declared in relation to the current declaration. @rsmith may feel differently, however.

Bumping for an update here. We can tolerate a build breakage for our older kernels over the weekend, but we should really try to get this resolved by EOW, otherwise we need to look into reverting:

clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

What do you mean? I'm sorry, I don't quite follow.

1315–1317

Sorry, I don't quite follow whether your approving of the current approach or dismissive?

aaron.ballman added inline comments.Oct 25 2021, 11:02 AM
clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

From the test cases below:

extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) unsigned long strlen(const char *p) {
  return 1;
}
unsigned long mystrlen(char const *s) {
  return strlen(s);
}
unsigned long strlen(const char *s) {
  return 2;
}

These redeclarations resolve a particular way by GCC and this patch is intending to match that behavior. My question ultimately boils down to whether that also happens for this code, where the function is not a builtin but looks like one due to the attributes:

extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) unsigned long wahoo(const char *p) {
  return 1;
}
unsigned long mywahoo(char const *s) {
  return wahoo(s);
}
unsigned long wahoo(const char *s) {
  return 2;
}

If this also reorders, then I don't think we can look at whether FD->getBuiltinID() != 0 to decide whether to do the reordering dance because arbitrary user functions aren't Clang builtins and so they'd not get the correct behavior. Does that make sense?

1315–1317

I don't think we should modify the redecl chain from CheckFunctionDeclaration() -- this case would create a redeclaration chain whose previous link was not temporally the previous declaration. There might be another approach so we can avoid replaceAllUsesWith(). One possibility (no idea how feasible or what explodes) is to modify FunctionDecl::getDefinition() to look through the chain to return the best definition when there are multiple definitions to pick from.

clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

If this also reorders

It does; https://godbolt.org/z/bbrox7f6e.

Does that make sense?

Yes, thanks for the additional info. In this case, I guess we can disregard my feedback that started this thread, marking it as done?

Perhaps @serge-sans-paille should add such a non-builtin test case as part of the change?

aaron.ballman added inline comments.Oct 25 2021, 11:59 AM
clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

I think you have a valid concern about the extra allocations, but I'm not certain of a better predicate to use. My intuition is that the overhead here won't be unacceptable, but it'd be good to know we're not regressing compile time performance significantly.

Additional test coverage with a comment as to why we're testing it is a good idea!

clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

Perhaps we can first test whether this FuctionDecl is a redecl, then do the allocation, then check if the .inline suffix?

That way we avoid creating the new string unless we're codgen'ing a redecl, which should be uncommon in practice.

aaron.ballman added inline comments.Oct 25 2021, 12:26 PM
clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

That could save us a bit of perf, but I think redecls are actually quite common because a definition is itself a declaration, so having a decl in a header file and defn in a source file will produce a redecl chain.

clang/lib/CodeGen/CodeGenFunction.cpp
1301–1302

Ah, ok then.

Additional test coverage with a comment as to why we're testing it is a good idea!

Yeah, with that and fixes to the other small nits in the test case, then I think this is ready to land.

  • Add a test case to ensure we keep the right behavior for non-intrinsic gnu inline
  • walk the redecl chain before doing an extra string alloc

Formatting nits

aaron.ballman accepted this revision.Oct 26 2021, 8:50 AM

LGTM, though codegen is not my area of expertise.

This revision is now accepted and ready to land.Oct 26 2021, 8:50 AM
nickdesaulniers accepted this revision.Oct 26 2021, 3:19 PM

Thank you for fixing this terrible edge case; LGTM.

Actually, it looks like:

diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 69d2ef631872..8e77cdef2ed5 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -10927,6 +10927,10 @@ bool Sema::CheckFunctionDeclaration(Scope *S, FunctionDecl *NewFD,
   }
 
   if (Redeclaration) {
+    if (cast<FunctionDecl>(OldDecl)->isInlineBuiltinDeclaration() && !NewFD->isInlineBuiltinDeclaration()) {
+      // Set a flag on NewFD that it's a shadowed gnu_inline that should be
+      // emitted, or on OldDecl that it should not be emitted?
+    }
     // NewFD and OldDecl represent declarations that need to be
     // merged.
     if (MergeFunctionDecl(NewFD, OldDecl, S, MergeTypeWithPrevious)) {

might detect what you need. Can we perhaps use those checks there (when we have already detected a redeclaration) to perhaps set new members (to be added) on FunctionDecl that they shouldn't be emitted or not because they are this weird case? Then CodeGenFunction::GenerateCode can simply check that flag, then erase the existing function (or not generate it in the first place by setting a flag on OldDecl perhaps?). No redecl walking for every FunctionDecl required.

clang/test/CodeGen/user-func-gnu-inline-redecl.c
21

this test passes before this patch is applied; I wonder if we have existing coverage in tree for this case?

Surprisingly, I don't think we do. Perhaps gnu-inline-redecl.c might be a more concise test name?

I can't help but shake the feeling that the builtin id stuff is a degenerate case of how GCC treats redeclarations of extern inline (gnu_inline) functions and that perhaps by solving just that, fixing the case of builtins might just "fall out" from that.

  • Use a FunctionDecl Attribute to store the shadowed inline redecl status
clang/test/CodeGen/user-func-gnu-inline-redecl.c
21

Clang indeed naturally handles gnu_inline in a decent way. The problem we're trying to solve now is a side effect of the premature renaming of function call site when we think it's a direct call to inline builtin. I've updated the implementation to avoid walking redecls.

  • Use a FunctionDecl Attribute to store the shadowed inline redecl status

The downside to this approach is that we can now handle less ctor initializers because we need to steal a bit from there. We're going from allowing 2,097,152 ctor inits to 1,048,576, which is still a pretty large number of ctor inits, so I think this new approach is defensible. However, this does make it that much harder to add any new bits in the future. I agree that walking the redecl chain could potentially cause a performance issue, but that's speculative without some measurements. Because we don't have those measurements, I'm actually a bit more comfortable with walking the redecl chain than the current approach -- if we measure performance and find that walking this chain is a bottleneck, then it makes it more obvious that the new approach is worthwhile.

I second @aaron.ballman there. I compiled the sqlite3.c amalgamation, -O0, with both approach, measuring the number of instructions as gathered by valgrind --tool=callgrind

  • when walking redecls: 9001630039 instructions, I changed the implementation a bit down to 9001628850
  • when storing redecl state: 9000816370 instructions

Here's a [hastily and poorly written] script to measure the average cycle counts for 30 invocations using linux perf: https://gist.github.com/nickdesaulniers/4a20ba10c26ac2ad02cb0425b8b0f826

For Diff 382671 (latest; storing redecl state), builds of the linux kernel x86_64 defconfig+CONFIG_FORTIFY_SOURCE=y:

$ /tmp/measure_30.sh 'make LLVM=1 -j72' 'make LLVM=1 -j72 clean'
...
Average of 30 runs: 10780685107280.83 cycles

For Diff 382344 (earlier; walking redecl chain), builds of the linux kernel x86_64 defconfig+CONFIG_FORTIFY_SOURCE=y:

$ /tmp/measure_30.sh 'make LLVM=1 -j72' 'make LLVM=1 -j72 clean'
...
Average of 30 runs: 10745227016663.00 cycles

Damn, so what I proposed was slower, at least for the major case that I care about. I suspect that perhaps there's more forward declarations than actual functions we end up generating IR for, perhaps...either way, I'm sorry for suggesting the "storing redecl state" approach. If we want to go back to the earlier version in Diff 382344, at this point, I'd be happy to accept that revision. Sorry for causing whiplash on this.

Here's a [hastily and poorly written] script to measure the average cycle counts for 30 invocations using linux perf: https://gist.github.com/nickdesaulniers/4a20ba10c26ac2ad02cb0425b8b0f826

For Diff 382671 (latest; storing redecl state), builds of the linux kernel x86_64 defconfig+CONFIG_FORTIFY_SOURCE=y:

$ /tmp/measure_30.sh 'make LLVM=1 -j72' 'make LLVM=1 -j72 clean'
...
Average of 30 runs: 10780685107280.83 cycles

For Diff 382344 (earlier; walking redecl chain), builds of the linux kernel x86_64 defconfig+CONFIG_FORTIFY_SOURCE=y:

$ /tmp/measure_30.sh 'make LLVM=1 -j72' 'make LLVM=1 -j72 clean'
...
Average of 30 runs: 10745227016663.00 cycles

Damn, so what I proposed was slower, at least for the major case that I care about. I suspect that perhaps there's more forward declarations than actual functions we end up generating IR for, perhaps...either way, I'm sorry for suggesting the "storing redecl state" approach. If we want to go back to the earlier version in Diff 382344, at this point, I'd be happy to accept that revision. Sorry for causing whiplash on this.

Thank you for measuring this as well! And no worries on the whiplash (easy for me to say as a reviewer, hah) -- I think it was a reasonable thought to explore and measure the performance of. FWIW, I'd be happy accepting the earlier revision as well.

Re-uploading previous version that walks redef, with a slight change in the walking algorithm.

aaron.ballman accepted this revision.Oct 29 2021, 4:08 AM

LGTM aside from a formatting nit. I don't think the precommit CI failures are related to your patch from what I was seeing, but may be worth keeping an eye on once you land just in case.

clang/lib/CodeGen/CodeGenFunction.cpp
1323

Please fix the formatting.

nickdesaulniers accepted this revision.Oct 29 2021, 10:09 AM

thanks again for all of the work that went into this!

This revision was landed with ongoing or failed builds.Nov 2 2021, 1:54 AM
This revision was automatically updated to reflect the committed changes.