Page MenuHomePhabricator

Linux support for mimalloc as a custom allocator
Needs ReviewPublic

Authored by michaeljclark on Apr 28 2021, 12:34 AM.

Details

Summary

Adds support for mimalloc with static linking on Linux, for testing purposes, the following combination works:

- -DLLVM_INTEGRATED_CRT_ALLOC=mimalloc and -DLLVM_LINK_LLVM_DYLIB=OFF

I have been working on testing rpmalloc and mimalloc as custom allocators with clang on Linux. Neither of them works out of the box with dynamic linking due to a hang bug in glibc 2.31 (Ubuntu 20.04LTS) elf/rtld.c:_dl_start_final which I have not yet debugged fully. As a result this patch just adds support for mimalloc with static linking (LLVM_LINK_LLVM_DYLIB=OFF) which was the only configuration I could get working.

Most of the patch is implementing CMake constraints conditions similar to the Windows support so that CMake will report errors for combinations that do not work, so it checks LLVM_LINK_LLVM_DYLIB and only allows mimalloc on Linux. The patch expects mimalloc to be checked out next to the LLVM directory same as the Windows support. Unlike the windows patch, mimalloc is added as a CMake subdirectory and mimalloc-static is properly exported into LLVM's CMake.

It adds the following error messages:

  • LLVM_INTEGRATED_CRT_ALLOC is only supported on Windows and Linux.
  • LLVM_INTEGRATED_CRT_ALLOC currently only supports mimalloc on Linux!"
  • LLVM_INTEGRATED_CRT_ALLOC cannot be used with LLVM_LINK_LLVM_DYLIB on Linux!"

Diff Detail

Event Timeline

michaeljclark created this revision.Apr 28 2021, 12:34 AM
michaeljclark requested review of this revision.Apr 28 2021, 12:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2021, 12:34 AM
mjp41 added a subscriber: mjp41.Apr 29 2021, 12:14 AM

I have been working on testing rpmalloc and mimalloc as custom allocators with clang on Linux. Neither of them works out of the box with dynamic linking due to a hang bug in glibc 2.31 (Ubuntu 20.04LTS) elf/rtld.c:_dl_start_final which I have not yet debugged fully.

+@maniccoder

Which version of rpmalloc did you use?

I tried building with integrated rpmalloc (using the latest develop branch) and while the compiled toolchain seemed to work fine (clang, lld) I ran into issues in the llvm test suite where it passed what seemed to be a global data segment pointer to delete operator. I will keep digging into this.

Ping! I did a some performance tests several months ago, and it seems rpmalloc was generally faster than glibc malloc, so this patch seems beneficial (to avoid relying on the LD_PRELOAD thing).
@maniccoder Did you manage by chance to look at the issue you mentioned?

No, unfortunately not, and it fell between the cracks of switching jobs and focusing other personal projects. However, this sparked my interest again so I will give it a go again and ping back here what I find.

I am interested in having this for Linux, too :) What is blocking this?

iirc the problem was that the LLVM test suite failed on Linux when using rpmalloc as allocator due it what seemed to be static global data being passed to free(), causing the allocator to crash. Either the allocator was doing something wrong, causing this behaviour, or there was something fishy going on somewhere else. I didn't manage to find the root cause back then.

I recently started looking into this again, will post more here when I have some news.

Perhaps we don't need additional CMake code for using an alternative malloc on Linux.
For example, to use mimalloc it is simply -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,--whole-archive,path/to/libmimalloc.a,--pop-state

When linking a default build of chrome (no debug) and a Release build of clang, using libmimalloc.a is 1.12x as fast. LD_PRELOAD=path/to/libmimalloc.so is 1.11x as fast.

Perhaps we don't need additional CMake code for using an alternative malloc on Linux.
For example, to use mimalloc it is simply -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,--whole-archive,path/to/libmimalloc.a,--pop-state

Does this work with -DLLVM_LINK_LLVM_DYLIB=ON?

The benefit of adding a build system option is that it encodes this knowledge within the build system. For example, it took me quite a long time to figure out that glibc malloc_hooks are not thread-safe and broken if using shared libraries. If this approach works then perhaps it could be added as one way to do this, but honestly, I find it a bit messy to have a built archive dependency and would prefer to have that as an option along with a standard subdirectory include as there is for Clang and the other modular tools and library dependencies. I am making changes to the allocator and testing them within LLVM so it is quite good to have CMake rebuild the dependency if there are any changes.

One approach might be to add a potential LLVM_INTEGRATED_CRT_ALLOC_LIB or LLVM_INTEGRATED_CRT_ALLOC_DYLIB config option that points to an archive or dynamic library, or one variable with suffix detection. It may or may not be possible to use the project-wide linker flags due to address or memory sanitizer as it could interfere with their interposition mechanism? I do not know. I guess it needs testing.

When linking a default build of chrome (no debug) and a Release build of clang, using libmimalloc.a is 1.12x as fast. LD_PRELOAD=path/to/libmimalloc.so is 1.11x as fast.

Indeed, I found this to be the case too.

I am interested in having this for Linux, too :) What is blocking this?

IDK. I just posted the patch for review because I thought folks might find it useful.

I did not get to the bottom of why it was not working (for me at least) with -DLLVM_LINK_LLVM_DYLIB=ON. It is so long ago I can't remember what the issue was. It may have been a segfault in the glibc dynamic linker. Of course there is musl libc to think about and test too. It might be that we need to build mimalloc shared if -DLLVM_BUILD_LLVM_DYLIB=ON and link mimalloc shared if -DLLVM_LINK_LLVM_DYLIB=ON

I needed to add the following to build latest LLVM with latest mimalloc as the CMakeLists.txt might not have been building mimalloc-static by default.

diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index dc7c205a6904..87cb029f97c1 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -95,6 +95,9 @@ if(LLVM_INTEGRATED_CRT_ALLOC)
   elseif(LLVM_INTEGRATED_CRT_ALLOC MATCHES "mimalloc$")
     if(CMAKE_SYSTEM_NAME STREQUAL "Linux")
       set(MI_OVERRIDE ON CACHE BOOL "Override the standard malloc")
+      set(MI_BUILD_STATIC ON CACHE BOOL "Build static library")
+      set(MI_BUILD_SHARED OFF CACHE BOOL "Build shared library")
+      set(MI_BUILD_TESTS OFF CACHE BOOL "Build tests")
       add_subdirectory(${LLVM_INTEGRATED_CRT_ALLOC} mimalloc)
       export(EXPORT mimalloc)
       set(system_libs ${system_libs} "mimalloc-static")

Perhaps we don't need additional CMake code for using an alternative malloc on Linux.
For example, to use mimalloc it is simply -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,--whole-archive,path/to/libmimalloc.a,--pop-state

Does this work with -DLLVM_LINK_LLVM_DYLIB=ON?

Yes. The linker will export symbols like malloc to .dynsym in the executable. At run-time, the executable malloc will interpose undefined references from libLLVM-14git.so.

In a -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,/tmp/p/mimalloc/out/release/libmimalloc.a,--pop-state -DLLVM_LINK_LLVM_DYLIB=ON build, `ninja check-lld-elf passes.

The benefit of adding a build system option is that it encodes this knowledge within the build system. For example, it took me quite a long time to figure out that glibc malloc_hooks are not thread-safe and broken if using shared libraries. If this approach works then perhaps it could be added as one way to do this, but honestly, I find it a bit messy to have a built archive dependency and would prefer to have that as an option along with a standard subdirectory include as there is for Clang and the other modular tools and library dependencies. I am making changes to the allocator and testing them within LLVM so it is quite good to have CMake rebuild the dependency if there are any changes.

This is useful when there is non-trivial logic.
For example, if we need to disable some other features or report a diagnostic when two features are incompatible, having such a toggle may be useful.

For Linux, linking libmimalloc appears a completely orthogonal feature to me.
llvm-project has hundreds of CMake variables and most people don't know most options.
Having this option may just add yet another option which people don't know.

One approach might be to add a potential LLVM_INTEGRATED_CRT_ALLOC_LIB or LLVM_INTEGRATED_CRT_ALLOC_DYLIB config option that points to an archive or dynamic library, or one variable with suffix detection. It may or may not be possible to use the project-wide linker flags due to address or memory sanitizer as it could interfere with their interposition mechanism? I do not know. I guess it needs testing.

When linking a default build of chrome (no debug) and a Release build of clang, using libmimalloc.a is 1.12x as fast. LD_PRELOAD=path/to/libmimalloc.so is 1.11x as fast.

Indeed, I found this to be the case too.

Thanks for confirmation.

For Linux, linking libmimalloc appears a completely orthogonal feature to me.
llvm-project has hundreds of CMake variables and most people don't know most options.
Having this option may just add yet another option which people don't know.

The case here is not so much adding a new option, rather it is about unmasking an option that already exists but doesn't yet support Linux. Take a look at the existing LLVM_INTEGRATED_CRT_ALLOC in llvm/lib/support/CMakeLists.txt. The other allocators are compiled from source. I think having a choice of source or built archive is reasonable but it probably should be a consistent option for all of the allocators, assuming testing is done and they are unmasked after verification. I don't see mimalloc going into the monorepo like clang, so in that respect it is different, so ExternalProject_Add might be an interesting option, as it is useful if the dependency is not in the repo but one wants it to be automatically fetched and built. I like that option because it can be made to "just work" by default assuming there is network connectivity. Also having an ExternalProject_Add git URL option allows the project url to be overridden and point to a local path, which is how I would use it. I believe the CMake build targets for ExternalProject_Add end up in the project namespace so you don't need to hardcode paths or at least the paths are platform-independent. The current Windows mimalloc support appears to hardcode architecture-specific paths.

Based on your link command, then perhaps I need to use something like this on Linux:

-Wl,--push-state,${PROJECT_BINARY_DIR}/lib/libmimalloc.a,--pop-state

I'll do some testing when I get time to look at this again. I could look at testing a patch that links mimalloc consistently between Linux and Windows perhaps using ExternalProject_Add. ExternalProject_Add would suit me and perhaps other users wanting to try mimalloc, and it seems to be used elsewhere in the LLVM build so there is precedence for using it. It would be better than the hardcoded paths which is what is there at present for the Windows mimalloc support.

This is patch is interesting to me because builds of llvm statically linked against musl have numerous memory allocation related functions in their top 20 functions sampled in builds of the Linux kernel.

If I apply the following diff on top of this diff, I can build rpmalloc into LLVM:

diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 96cd28afe80b..bd5b73361d2b 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -600,7 +600,8 @@ if(LLVM_INTEGRATED_CRT_ALLOC)
   if(NOT (CMAKE_SYSTEM_NAME MATCHES "(Linux|Windows)"))
     message(FATAL_ERROR "LLVM_INTEGRATED_CRT_ALLOC is only supported on Windows and Linux.")
   elseif(CMAKE_SYSTEM_NAME STREQUAL "Linux")
-    if(NOT (LLVM_INTEGRATED_CRT_ALLOC MATCHES "mimalloc$"))
+    if(LLVM_INTEGRATED_CRT_ALLOC MATCHES "rpmalloc$")
+    elseif(NOT (LLVM_INTEGRATED_CRT_ALLOC MATCHES "mimalloc$"))
       message(FATAL_ERROR "LLVM_INTEGRATED_CRT_ALLOC currently only supports mimalloc on Linux!")
     elseif(LLVM_BUILD_LLVM_DYLIB AND LLVM_LINK_LLVM_DYLIB)
       message(FATAL_ERROR "LLVM_INTEGRATED_CRT_ALLOC cannot be used with LLVM_LINK_LLVM_DYLIB on Linux!")
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 37743a583860..5d84e3040c11 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -88,6 +88,9 @@ if(LLVM_INTEGRATED_CRT_ALLOC)
 
   if(LLVM_INTEGRATED_CRT_ALLOC MATCHES "rpmalloc$")
     add_definitions(-DENABLE_OVERRIDE -DENABLE_PRELOAD)
+    set_source_files_properties(
+      "${LLVM_INTEGRATED_CRT_ALLOC}/rpmalloc/rpmalloc.c"
+      PROPERTIES COMPILE_FLAGS -Wno-static-in-inline)
     set(ALLOCATOR_FILES "${LLVM_INTEGRATED_CRT_ALLOC}/rpmalloc/rpmalloc.c")
   elseif(LLVM_INTEGRATED_CRT_ALLOC MATCHES "snmalloc$")
     set(ALLOCATOR_FILES "${LLVM_INTEGRATED_CRT_ALLOC}/src/snmalloc/override/new.cc")

which is simply meant to be a quick hack to allow me to pass -DLLVM_INTEGRATED_CRT_ALLOC=/path/to/rpmalloc to my cmake invocation used to configure llvm. Result segfaults pretty quickly though trying to run the llvm unit tests though. :(

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 7:14 PM
Herald added a subscriber: StephenFan. · View Herald Transcript

For non-Windows platforms, we still haven't seen a convincing argument to add more complexity while -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,path/to/mimalloc/out/release/libmimalloc.a,--pop-state works.