This is an archive of the discontinued LLVM Phabricator instance.

PreISelIntrinsicLowering: don't expand memcpys in minsize functions, even with no-builtins.
AbandonedPublic

Authored by aemerson on Jul 20 2023, 12:27 AM.

Details

Summary

Even in the presence of -fno-builtin, users may directly call memcpy via the
builtin. In this case expanding will cause a size increase, so just avoid
expanding if the caller is a minsize function.

Diff Detail

Event Timeline

aemerson created this revision.Jul 20 2023, 12:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 12:27 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
aemerson requested review of this revision.Jul 20 2023, 12:27 AM
arsenm added inline comments.Jul 20 2023, 4:23 AM
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
201

Lowering the intrinsic when the libcall isn't available is mandatory, you can't simply skip it.

For minsize, you could adjust the default size threshold. Also, we could inject into the module an implementation you expand into, so the expansion happens fewer times

llvm/test/CodeGen/ARM/no-expand-memcpy-minsize-no-builtins.ll
10

I don't understand this, minsize can't override an explicit don't-do-this

aemerson added inline comments.Jul 20 2023, 10:00 AM
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
201

no-builtins is attached to the Function with -fno-builtins right? However the semantics of -fno-builtins is not AFAICT that we need to expand all memcpy intrinsics. The semantics are that clang shouldn't try to recognize things like memcpy loops and turn them into intrinsics. Or that plain memcpy calls shouldn't be converted into intrinsics.

aemerson added subscribers: gchatelet, efriedma.

@efriedma @gchatelet do you happen to know what the expectation here is if we have a memcpy intrinsic with -fno-builtins/"no-builtins" attribute?

arsenm added inline comments.Jul 20 2023, 4:22 PM
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
201

This whole thing is a confusing mess, there's a thread about it going on over in D74162.

"no-builtins" isn't in the LangRef and I don't really understand how it's supposed to differ from nobuiltin.

I think we have 2 blurred concepts of which calls are recognized and which calls can be emitted. The call here should probably be considering "RuntimeLibcalls" not TLI, but that doesn't seem to be written down anywhere

As far as I know, before 3c848194f28decca41b7362f9dd35d4939797724, things worked the following way:

  • LLVM optimizations wouldn't introduce llvm.memcpy calls into nobuiltin functions.
  • Any llvm.memcpy calls written explicitly in the input IR expand to "memcpy" (which we assume the user provides if necessary).

Maybe you could argue the latter behavior is a "bug", but it's behavior we provided for many years, and users depend on it. (And it's gcc-compatible.)

3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code. This apparently landed without review?

As far as I know, before 3c848194f28decca41b7362f9dd35d4939797724, things worked the following way:

  • LLVM optimizations wouldn't introduce llvm.memcpy calls into nobuiltin functions.
  • Any llvm.memcpy calls written explicitly in the input IR expand to "memcpy" (which we assume the user provides if necessary).

Maybe you could argue the latter behavior is a "bug", but it's behavior we provided for many years, and users depend on it. (And it's gcc-compatible.)

I've considered that the codegen pipeline can't deal with freestanding environments for fundamental intrinsics to be a major flaw.

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

Independently of that, the lowering here could improve to optionally emit outlined implementations depending on target preference + minsize/optsize.

3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code.

This apparently landed without review?

This was D152383

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

So conventional CPU targets use "memcpy", GPUs and other exotic targets use the new expansion codepath? That makes sense, I guess.

3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code.

This apparently landed without review?

This was D152383

Sorry, missed that.

As far as I know, before 3c848194f28decca41b7362f9dd35d4939797724, things worked the following way:

  • LLVM optimizations wouldn't introduce llvm.memcpy calls into nobuiltin functions.
  • Any llvm.memcpy calls written explicitly in the input IR expand to "memcpy" (which we assume the user provides if necessary).

Maybe you could argue the latter behavior is a "bug", but it's behavior we provided for many years, and users depend on it. (And it's gcc-compatible.)

3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code. This apparently landed without review?

As far as I know, before 3c848194f28decca41b7362f9dd35d4939797724, things worked the following way:

  • LLVM optimizations wouldn't introduce llvm.memcpy calls into nobuiltin functions.
  • Any llvm.memcpy calls written explicitly in the input IR expand to "memcpy" (which we assume the user provides if necessary).

Maybe you could argue the latter behavior is a "bug", but it's behavior we provided for many years, and users depend on it. (And it's gcc-compatible.)

I've considered that the codegen pipeline can't deal with freestanding environments for fundamental intrinsics to be a major flaw.

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

Are you working on this? If not we'd like to get this fixed and have the previous behavior restored.

Independently of that, the lowering here could improve to optionally emit outlined implementations depending on target preference + minsize/optsize.

3c848194f28decca41b7362f9dd35d4939797724 changes things so that instead of calling "memcpy", we generate a bunch of terrible inline code.

This apparently landed without review?

This was D152383

arsenm added a comment.Aug 1 2023, 2:46 PM

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

Are you working on this? If not we'd like to get this fixed and have the previous behavior restored.

No, I thought you were since you posted the patch

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

Are you working on this? If not we'd like to get this fixed and have the previous behavior restored.

No, I thought you were since you posted the patch

Ok got it.

hiraditya added inline comments.Aug 9 2023, 4:02 PM
llvm/test/CodeGen/ARM/no-expand-memcpy-minsize-no-builtins.ll
10

yeah, this could result in errors where there is no memcpy implementation.

arsenm added a comment.Aug 9 2023, 6:01 PM

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

Are you working on this? If not we'd like to get this fixed and have the previous behavior restored.

No, I thought you were since you posted the patch

Ok got it.

I interpreted this as you were doing it but I did this in D157567

aemerson added a comment.EditedAug 9 2023, 9:05 PM

I think the fix that avoids the current problem, and solves my desires is to move from checking TargetLibraryInfo.hasLibFunc(memcpy) to checking if TargetLoweringInfo::getLibcallName(memcpy) is valid (plus maybe some additional libcall-is-valid-for-addrspace checks?). Sorting out the general libcall mess for freestanding is a problem for another day

Are you working on this? If not we'd like to get this fixed and have the previous behavior restored.

No, I thought you were since you posted the patch

Ok got it.

I interpreted this as you were doing it but I did this in D157567

Yeah I was going to, I'm just on vacation so hadn't gotten around to it. Thanks!

arsenm resigned from this revision.Aug 10 2023, 1:41 PM

This can be abandoned

aemerson abandoned this revision.Sep 5 2023, 8:44 AM