Even in the presence of -fno-builtin, users may directly call memcpy via the
builtin. In this case expanding will cause a size increase, so just avoid
expanding if the caller is a minsize function.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
80 ms | x64 windows > Clang.SemaCXX::static-assert-cxx26.cpp |
Event Timeline
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp | ||
---|---|---|
201 | Lowering the intrinsic when the libcall isn't available is mandatory, you can't simply skip it. For minsize, you could adjust the default size threshold. Also, we could inject into the module an implementation you expand into, so the expansion happens fewer times | |
llvm/test/CodeGen/ARM/no-expand-memcpy-minsize-no-builtins.ll | ||
10 | I don't understand this, minsize can't override an explicit don't-do-this |
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp | ||
---|---|---|
201 | no-builtins is attached to the Function with -fno-builtins right? However the semantics of -fno-builtins is not AFAICT that we need to expand all memcpy intrinsics. The semantics are that clang shouldn't try to recognize things like memcpy loops and turn them into intrinsics. Or that plain memcpy calls shouldn't be converted into intrinsics. |
@efriedma @gchatelet do you happen to know what the expectation here is if we have a memcpy intrinsic with -fno-builtins/"no-builtins" attribute?
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp | ||
---|---|---|
201 | This whole thing is a confusing mess, there's a thread about it going on over in D74162. "no-builtins" isn't in the LangRef and I don't really understand how it's supposed to differ from nobuiltin. I think we have 2 blurred concepts of which calls are recognized and which calls can be emitted. The call here should probably be considering "RuntimeLibcalls" not TLI, but that doesn't seem to be written down anywhere |
As far as I know, before 3c848194f28decca41b7362f9dd35d4939797724, things worked the following way:
- LLVM optimizations wouldn't introduce llvm.memcpy calls into nobuiltin functions.
- Any llvm.memcpy calls written explicitly in the input IR expand to "memcpy" (which we assume the user provides if necessary).
Maybe you could argue the latter behavior is a "bug", but it's behavior we provided for many years, and users depend on it. (And it's gcc-compatible.)
3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code. This apparently landed without review?
I've considered that the codegen pipeline can't deal with freestanding environments for fundamental intrinsics to be a major flaw.
I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day
Independently of that, the lowering here could improve to optionally emit outlined implementations depending on target preference + minsize/optsize.
3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code.
This apparently landed without review?
This was D152383
So conventional CPU targets use "memcpy", GPUs and other exotic targets use the new expansion codepath? That makes sense, I guess.
3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code.
This apparently landed without review?
This was D152383
Sorry, missed that.
Are you working on this? If not we'd like to get this fixed and have the previous behavior restored.
Independently of that, the lowering here could improve to optionally emit outlined implementations depending on target preference + minsize/optsize.
3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code.
This apparently landed without review?
This was D152383
llvm/test/CodeGen/ARM/no-expand-memcpy-minsize-no-builtins.ll | ||
---|---|---|
10 | yeah, this could result in errors where there is no memcpy implementation. |
Lowering the intrinsic when the libcall isn't available is mandatory, you can't simply skip it.
For minsize, you could adjust the default size threshold. Also, we could inject into the module an implementation you expand into, so the expansion happens fewer times