This is an archive of the discontinued LLVM Phabricator instance.

Demonstrate how to fix freestanding for memcpy
AbandonedPublic

Authored by gchatelet on Apr 15 2019, 9:45 AM.

Details

Reviewers
courbet
Summary

This is WIP and not to submit as is. TODO:

  • add unit tests
  • fix other mem functions (memset, memcmp, ...)

Event Timeline

gchatelet created this revision.Apr 15 2019, 9:45 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 15 2019, 9:45 AM
gchatelet retitled this revision from Fixing freestanding for memcpy. to Demonstrate how to fix freestanding for memcpy.Apr 15 2019, 9:48 AM
gchatelet edited the summary of this revision. (Show Details)

As discussed offline, I think this should go through an RFC process.

I guess the main reservation that people will have is that this might generate a very large number of load stores (because we don't have a good way to generate loops here). This is not an issue for X86 because of REPMOVS, but it would be cool to have the opinion of people familiar with other architectures. One way to tackle this issue could be to expand existing mem* functions before the DAG. This is already happening for memcmp in ExpandMemcmp, which could be made aware of this flag. Similar approaches could be used for other mem* functions. But eventually this has to be handled here as the dag itself can call getMemcpy().

Also, the current approach does not protect against current users and future unsavvy users calling SelectionDAG::getMemcpy with AlwaysAlign == false, so what about actually retrieving the flag within the function instead of outside ? This happens e.g. in SelectionDAG::visitMemPCpyCall()m AArch64TargetLowering::LowerCall and others.

clang/test/CodeGen/freestanding-disables-libc.c
6

That's the responsibility of LLVM, so this test should be in llvm/test/Codegen.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6093

Please add an error message (maybe "modules with 'force-inline-libc' should never emit library calls")

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5483

isForceInlineLibc

I think it'd be pretty unpopular with the people I know who use freestanding. They're mostly working on microcontrollers and compiling -Oz so the extra code size would be untenable; they also have memcpy implementations anyway because they use it in their own code.

If I was trying to solve this (and I'm also not 100% sure it needs solving), I think I'd instead generate a linkonce definition of memcpy whenever it's needed and leave CodeGen unmodified.

I think it'd be pretty unpopular with the people I know who use freestanding. They're mostly working on microcontrollers and compiling -Oz so the extra code size would be untenable; they also have memcpy implementations anyway because they use it in their own code.

If I was trying to solve this (and I'm also not 100% sure it needs solving), I think I'd instead generate a linkonce definition of memcpy whenever it's needed and leave CodeGen unmodified.

Thx for chiming in @t.p.northover . Since the patch was mostly a proof of concept I haven't explained well what the intent was.
Ultimately I'm interested in implementing libc's mem function in C++. Let's take memcpy for instance, it would be composed out of fixed sized copies created from __builtin_memcpy. Unfortunately, the compiler is allowed to replace the intrinsic with a call to memcpy - which I'm currently defining - the generated code would recurse indefinitely.

IIUC freestanding environment should not rely on memcpy being present so my take on it was that by "fixing" freestanding I could have my cake and eat it too.

There are other options to explore but your comment shows that it's better to start a discussion around an RFC.

IIUC freestanding environment should not rely on memcpy being present so my take on it was that by "fixing" freestanding I could have my cake and eat it too.

The formal situation is that freestanding implementations are only required to provide language support stuff like va_list. They're allowed to give more of the standard library if they want though, as implementation defined behaviour.

In practice, everyone I know provides the basic string.h functions and the compiler is pretty explicit about relying on them being present for correctness. They're part of the surrounding environment a user is expected to supply (much like the various exception handling callbacks if you want C++ exceptions, but always required).

For the people actually using freestanding I think they're mostly considered important enough that they're implemented in assembly anyway so this isn't really a burden, but...

Ultimately I'm interested in implementing libc's mem function in C++. Let's take memcpy for instance

Ah, that is an entirely different problem from what I thought you were trying to do. In that case I'm all in favour of some solution, but would start thinking along the lines of -fno-builtin-memcpy options (it's possible that already does what you want, but can't get LLVM to form a memcpy from quick tests to check).

IIUC freestanding environment should not rely on memcpy being present so my take on it was that by "fixing" freestanding I could have my cake and eat it too.

The formal situation is that freestanding implementations are only required to provide language support stuff like va_list. They're allowed to give more of the standard library if they want though, as implementation defined behaviour.

In practice, everyone I know provides the basic string.h functions and the compiler is pretty explicit about relying on them being present for correctness. They're part of the surrounding environment a user is expected to supply (much like the various exception handling callbacks if you want C++ exceptions, but always required).

For the people actually using freestanding I think they're mostly considered important enough that they're implemented in assembly anyway so this isn't really a burden, but...

Ack. "Fixing" freestanding is not the way to go then : )

Ultimately I'm interested in implementing libc's mem function in C++. Let's take memcpy for instance

Ah, that is an entirely different problem from what I thought you were trying to do. In that case I'm all in favour of some solution, but would start thinking along the lines of -fno-builtin-memcpy options (it's possible that already does what you want, but can't get LLVM to form a memcpy from quick tests to check).

Thx for taking the time to suggest possible solution, I really appreciate it.

I already tried -fno-builtin-memcpy. The semantic of this flag is pretty counter intuitive and I'll explain why it doesn't fit my needs.
LLVM is allowed to replace a piece of code that looks like a memcpy with an IR intrinsic that implements the same semantic. You can see this by inspecting the IR here: https://godbolt.org/z/0y1Yqh.
Now if you use -fno-builtin-memcpy you're preventing the compiler from understanding that this is a memory copy. Look at the IR here: https://godbolt.org/z/lnCIIh.
The vectorizer kicks in in this case and the generated code is quite good but sometimes the generated code is pretty bad: https://godbolt.org/z/mHpAYe.

Last but not least -fno-builtin-memcpyprevents the compiler from understanding code constructs as memcpy but does not prevent the compiler from generating calls to memcpy:

Another solution is needed for my use case, either a new C++ intrinsics __builtin_inline_memcpy or an attribute that disables generated libc calls, compiler could either inline or fail with an error.

gchatelet abandoned this revision.Apr 26 2019, 5:44 AM