This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Trim memory periodically
ClosedPublic

Authored by qchateau on Dec 17 2020, 4:40 AM.

Details

Summary

This diff addresses the issue of the ever increasing memory usage of clangd. The key to understand what happens is to use malloc_stats(): malloc arenas keep getting bigger, although the actual memory used does not. It seems some operations while bulding the indices (both dynamic and background) create this problem. Specifically, 'FileSymbols::update' and 'FileSymbols::buildIndex' seem especially affected.

This diff adds a call to malloc_trim(0) at the end of FileSymbols::buildIndex.

For reference:
https://github.com/clangd/clangd/issues/251
https://github.com/clangd/clangd/issues/115


I'm not sure how much I'm allowed to use GNU extensions but this diff has been a game changer for me and my 8GB RAM laptop.

In any case, I think I've properly diagnosed to issue. Once trimmed, the actual memory usage of the process is approximately what clangd reports with the "memoryUsage" feature.

The solution is either what I suggest here (use malloc_trim) or rework a ton of code, change containers and allocators and hope the problem won't happen again in the future as we add features.

Diff Detail

Event Timeline

qchateau created this revision.Dec 17 2020, 4:40 AM
qchateau requested review of this revision.Dec 17 2020, 4:40 AM

malloc_trim isn't available on windows so this wont work, so this should likely be ifdef'd out on windows builds. Side note is this memory failing to free behaviour observed on windows?

clang-tools-extra/clangd/index/FileIndex.cpp
51

It may be helpful to log how many times this call fails to free any memory.

Side note is this memory failing to free behaviour observed on windows?

No idea, I only develop on linux. Looking at the the github issues, it seems people that had the issue were also using the linux version...so we can't conclude anything about windows

clang-tools-extra/clangd/index/FileIndex.cpp
51

From what I observed, it (almost?) never fails to free memory, although sometimes it's only a few kB.
Anyway, are you suggesting to keep track of how many time we call malloc_trim and how many times it fails ? Or simply logging when it fails ?

qchateau updated this revision to Diff 312500.EditedDec 17 2020, 7:50 AM

Use malloc_trim only if available

Wow, all abstractions truly are lies. Thanks for the investigation!

After doing some reading[1], my understanding is roughly:

  • our pattern of allocations causes glibc malloc to retain a *much* larger heap than our outstanding allocations, growing without (or with large) bound
  • malloc maintains several arenas and grows each as needed, and only implicitly shrinks them "from the top". A lives-forever object at the end of an arena will prevent it implicitly shrinking past that point.
  • clangd has lives-almost-forever allocations, e.g. when background indexing that could be causing this. If we get unlucky and allocate it at a point when the arena happens to be really full, it doesn't matter if the arena spends most of its time mostly-empty.
  • use of multiple threads means more arenas, and thus more fragmentation/waste.
  • malloc_trim() is able to free more aggressively, looking "inside" the arenas [2]
  • this affects glibc malloc only, which explains why we've only had this problem reported on linux and we haven't gotten reports from google-internal users (our production binaries use tcmalloc)

Some back-of-the-envelope numbers: I guess we grow until we have enough space to satisfy every allocation in every arena, so the wasted space/gaps is NumArenas times what you expect, which is 8 * NumCores, which can easily be 500 on a big workstation - if we expect (wild guess) 80% efficiency = 25% overhead for a single-threaded app, then this could yield 12500% overhead = 0.8% efficiency at the steady state, which is a 100x increase in memory usage. This sounds like a plausible explanation for the bug reports (but I'm not an expert and am guessing a lot).

[1] particularly: https://stackoverflow.com/questions/38644578/understanding-glibc-malloc-trimming
[2] https://man7.org/linux/man-pages/man3/malloc_trim.3.html - see notes regarding glibc 2.8


So where does this leave us, other than fairly unimpressed with glibc malloc?

Periodic malloc_trim seems like it solves a real problem, one I don't personally fully understand but we may never do - I feel convinced we should use it anyway. It probably has some overhead, who knows how much.

We need to work out how to guard it not just for glibc, but also for using glibc malloc rather than alternative like jemalloc, tcmalloc et al.
(We could *require* the use of an alternative malloc and not fix this, but that doesn't seem terribly practical to me)

It's process-wide (not per-thread, and certainly not per-program-module) so the fact that slapping it in FileSymbols::buildIndex solves the problem only proves that function is called periodically :-)
This is a system-level function, I think we should probably be calling it periodically from something top-level like ClangdMain or ClangdLSPServer.
(We could almost slip it in with ClangdLSPServer::maybeExportMemoryProfile(), except we don't want to link it to incoming notifications as memory can probably grow while silently background indexing)

I think we should pass a nonzero pad to malloc_trim of several MB. It only affects the main thread, but the main thread is constantly allocating small short-lived objects (e.g. JSON encoding/decoding) and cutting it to the bone only to have it sbrk again seems pointless.

clang-tools-extra/clangd/index/FileIndex.cpp
50

this function belongs in llvm/Support/Process.h, next to Process::GetMallocUsage().

This is full of OS-specific bits (#ifdef HAVE_MALLINFO) etc that probably guide how we should set this up.

275

I think this should like in ClangdLSPServer, triggered by all of:

  • various events: outgoing notification, incoming notification, onBackgroundIndexProgress()
  • an option specified by ClangdMain (I think it might even be cleanest to inject a "PeriodicMemoryCleanup" function via ClangdLSPServer::Options)
  • a timer having expired since the last time we called this - maybe a minute?
qchateau planned changes to this revision.Dec 17 2020, 8:40 AM

Periodic malloc_trim seems like it solves a real problem, one I don't personally fully understand but we may never do - I feel convinced we should use it anyway. It probably has some overhead, who knows how much.

It does indeed look like facing the problem from the wrong side (fight the end result instead of finding the root cause) but at that point it does solve an annoying problem. As I investigated this, I found that replacing DenseMap and StringMap wtih std::map mitigates the issue. I think allocating huge chunks of contiguous memory temporarily make this issue worse. Not very useful, but hey, I'm just sharing what I observed. As I previously said, resolving this problem by changing containers or allocators is a lot of work and is not guaranteed to actually solve the issue.

It's process-wide (not per-thread, and certainly not per-program-module) so the fact that slapping it in FileSymbols::buildIndex solves the problem only proves that function is called periodically :-)
This is a system-level function, I think we should probably be calling it periodically from something top-level like ClangdMain or ClangdLSPServer.
(We could almost slip it in with ClangdLSPServer::maybeExportMemoryProfile(), except we don't want to link it to incoming notifications as memory can probably grow while silently background indexing)

It can indeed be called somewhere else, I tried a few different placed before submitting this diff. That can be change easily, it's not a problem. I'd also like to see it at higher level: my first implementation called it periodically, inspired from maybeExportMemoryProfile, but I got worried by the background index filling the RAM while the user is not in front of his computer, onBackgroundIndexProgress should address this issue.

I think we should pass a nonzero pad to malloc_trim of several MB. It only affects the main thread, but the main thread is constantly allocating small short-lived objects (e.g. JSON encoding/decoding) and cutting it to the bone only to have it sbrk again seems pointless.

Good point

I think we should pass a nonzero pad to malloc_trim of several MB. It only affects the main thread, but the main thread is constantly allocating small short-lived objects (e.g. JSON encoding/decoding) and cutting it to the bone only to have it sbrk again seems pointless.

It would require a lot of work, but JSON encoding/decoding is possible without lots of allocations.
If you want to really compress memory usage, losing the map like access, O(1), on json objects, in favour of storing them in a vector, O(n), would be beneficial. I don't think the jsons in clangd ever really have maps with large numbers of keys which are queried often.

qchateau updated this revision to Diff 312781.Dec 18 2020, 6:18 AM
  • Add MallocTrim to llvm::sys::Process
  • Implement malloc trimming in ClangdLSPServer
qchateau retitled this revision from [clangd] Trim memory after buildINdex to [clangd] Trim memory periodically.Dec 18 2020, 6:19 AM
qchateau updated this revision to Diff 312853.Dec 18 2020, 10:53 AM
  • Fix clang-tidy errors
MaskRay added inline comments.Dec 18 2020, 11:03 AM
llvm/lib/Support/Unix/Process.inc
122 ↗(On Diff #312853)

If the user uses jemalloc/tcmalloc with glibc, malloc_trim may exist while the function may do nothing.

Thanks, this looks much closer to something we can land.
Bunch of nits :-)

clang-tools-extra/clangd/ClangdLSPServer.cpp
1306 ↗(On Diff #312853)

can you leave a FIXME to do this without a lock?
I've been meaning to add a library for these "every N seconds" tasks that uses atomic instead.

clang-tools-extra/clangd/ClangdLSPServer.h
52 ↗(On Diff #312853)

I don't particularly think the interval needs to be set through options (memory profiling interval isn't), we can just hardcode this.

On the other hand, doing this at all should be optional, and I think we should enable it from ClangdMain - this is a weird thing for a library to be doing by default, and as it stands we'll be doing it in unit tests and stuff.

I think my favorite variant would be

/// If set, periodically called to release memory.
/// Consider llvm::sys::Process::ReleaseHeapMemory;
std::function<void()> MemoryCleanup = nullptr;
200 ↗(On Diff #312853)

why do we store another copy of this instead of using the one in Opts?

llvm/include/llvm/Config/config.h.cmake
139 ↗(On Diff #312853)

nit: the malloc_trim function

llvm/include/llvm/Support/Process.h
78 ↗(On Diff #312853)

"free" is a bit vague here - return unused heap memory to the system?

79 ↗(On Diff #312853)

nit: "On glibc malloc, this calls malloc_trim(3)"

81 ↗(On Diff #312853)

I'm not sure this is terribly useful to report, and if someone wants to implement this for other OSes/allocators then it may not be available. Maybe just make this void?

82 ↗(On Diff #312853)

similarly in the name of portability, we could just pick a number like 10MB and hard-code it in the implementation, at least for now - maybe we don't end up needing this flexibility, and the semantics are probably peculiar to glibc

82 ↗(On Diff #312853)

I'd give this a more generic name like ReleaseUnusedHeap

llvm/lib/Support/Unix/Process.inc
122 ↗(On Diff #312853)

True. Unless I'm missing something, it won't be completely free - it'll still acquire glibc-malloc locks and walk over whatever arenas exist.

@MaskRay is there anything sensible to do about this? We can introduce a cmake variable or something you're supposed to set if you're using another allocator, but:

  • that seems fragile
  • fundamentally doesn't support switching allocators with LD_PRELOAD, I guess
124 ↗(On Diff #312853)

this is expected, we don't want a compiler warning here

qchateau planned changes to this revision.Dec 19 2020, 3:02 AM
qchateau marked 2 inline comments as done.
qchateau added inline comments.
clang-tools-extra/clangd/ClangdLSPServer.h
52 ↗(On Diff #312853)

Sounds good, do you think [1] we should throttle the calls to MemoryCleanup or [2] throttle the calls to malloc_trim from within the callback ?

I don't particularly think the interval needs to be set through options (memory profiling interval isn't), we can just hardcode this.

On the other hand, doing this at all should be optional, and I think we should enable it from ClangdMain - this is a weird thing for a library to be doing by default, and as it stands we'll be doing it in unit tests and stuff.

I like the idea of configuring the MemoryCleanup function from ClangdMain, but it's as easy to have [3] an int option that configures the interval as it is to have [4] a boolean one to just disable it.

I actually think both points are related:

  • If we choose [2], then I'd go for [3] because it is more configurable and adds no code complexity.
  • If we choose [1], then I agree we can go for [4] to avoid an extra option
llvm/lib/Support/Unix/Process.inc
122 ↗(On Diff #312853)

I'm not worried about LD_PRELOAD, worst case it adds a little bit of extra runtime cost (calling malloc_trim although it's useless because we preloaded tcmalloc).

But I agree there is an issue here: we detect which cleanup function to use depending on the headers (in cmake), but it should actually depend on what we link. Not sure if this can be done properly :/
For now we only want to cleanup on glibc malloc, but I'll investigate with tcmalloc and jemalloc, maybe they do not work so well either after all. Then the problem would be even worse: I'd be tempted to call a similar function for these libraries as well, and although the developer may have tcmalloc in its system include, it does not mean he is linking it -> undefined symbol

Overall this approach of calling malloc_trim is looking grim to me because it depends on what's linked.

Either we can continue in this way, or try to find another way around this issue:

  • Encourage the developer to link another malloc implementation, but then it's only useful if the various distributions do it
  • Continue to dig on why glibc malloc performs so bad with clangd and try to improve things in clangd code
  • Something else ?
124 ↗(On Diff #312853)

Same for windows I guess ? If I understand correctly your logic is "if it's not available that's because we don't need it: don't warn"

Just attached a debugger and ran malloc_trim on my clangd instance which was sitting at 6.7GB, afterwards it was 1.3GB. Interestingly clangd held rock solid at 1.0GB recorded memory usage throughout.

Interestingly clangd held rock solid at 1.0GB recorded memory usage throughout.

What do you mean exactly ? Using the built in memory usage measurement ?

Sorry to mess you around like this, but I think I might have jumped the gun in asking this to be abstracted away in Process.h. My reasoning is:

  • Once we are injecting the cleanup function from ClangdMain, that's not actually a crazy place to put this kind of system-level code, behind #ifdefs
  • The abstraction costs us something: we can't skip all the locking etc for memory cleanups if it's a no-op on our platform
  • We're putting an abstraction around one implementation without any reason to believe other implementations need this treatment or can use a similar interface. This is likely to turn out to be a poor abstraction, and llvm/Support isn't the easiest place to iterate on it.

So I think my favorite option may be back to:

  • Have ClangdLSPServer::Options take a nullable std::function<void()> to abstract the (optional) platform-specific operation
  • Have ClangdMain contain platform #ifdefs and define+inject the wrapper that calls malloc_trim
  • Optionally: have a clangd or cmake flag to disable this even if glibc was detected. This provides a workaround for other allocators if we need it.

Again, I'm sorry if it seems we're going in circles. I might put together a patch for this variant to compare...

Just attached a debugger and ran malloc_trim on my clangd instance which was sitting at 6.7GB, afterwards it was 1.3GB. Interestingly clangd held rock solid at 1.0GB recorded memory usage throughout.

Great to have confirmation of this, thanks!
Clangd's own memory-usage staying constant is expected (it's counting the live objects), as is real usage being higher (we count a subset, and we don't count malloc overhead/fragmentation).

Interestingly clangd held rock solid at 1.0GB recorded memory usage throughout.

What do you mean exactly ? Using the built in memory usage measurement ?

Yeah pretty sure this is the $/memoryUsage request or equivalent.

clang-tools-extra/clangd/ClangdLSPServer.h
52 ↗(On Diff #312853)

Sounds good, do you think [1] we should throttle the calls to MemoryCleanup or [2] throttle the calls to malloc_trim from within the callback ?

I think ClangdLSPServer should throttle the calls to the callback, because exactly what events trigger it internally is an implementation detail, and triggering periodically is a more useful API.

I like the idea of configuring the MemoryCleanup function from ClangdMain, but it's as easy to have [3] an int option that configures the interval as it is to have [4] a boolean one to just disable it.

If you configure the callback as a`std::function`, then it has a null state that naturally means "don't clean up memory", so you don't need the boolean. In this context I'm not sure having an option to configure is worthwhile.

llvm/cmake/config-ix.cmake
235 ↗(On Diff #312853)

For clangd's purposes we really want to call this function if we're using glibc (and not finding it is bad!), and really don't want to call it otherwise (and don't want to call it even if we find it).
So we may just want to #ifdef __GLIBC__ instead, especially if we do this in clangd instead of llvm/Support.

(In any case, this has been in glibc for a *long* time)

llvm/lib/Support/Unix/Process.inc
122 ↗(On Diff #312853)

Overall this approach of calling malloc_trim is looking grim to me because it depends on what's linked.

Let's not get lost in the weeds:

  • glibc malloc is by far the common case (on glibc platforms)
  • we've confirmed there's a major problem with this configuration, and it's essentially a well-known implementation deficiency specific to glibc malloc -malloc_trim solves this fairly simply
  • side-effects on uncommon configurations are likely to be minor
  • we don't know of any other common configurations that need a memory-trimming solution. (In the projects I've found that work around this, none mention similar problems on other platforms)

So this is a good-if-not-perfect solution, and I think we should ship it whether we can fix all the rough edges or not. It's probably going to be the single biggest improvement from clangd 11->12 for linux users.

124 ↗(On Diff #312853)

That, and also "warnings are annoying if you can't do anything about them". I think the #warning is there in the other functions because the #if/#else are expected-but-not-guaranteed to be exhaustive, because most platforms provide this functionality, which isn't the case here.

Sorry to mess you around like this, but I think I might have jumped the gun in asking this to be abstracted away in Process.h.

No problem, there is not much code anyway. The thought process behind the "right" solution is by far the most important.

So I think my favorite option may be back to:

  • Have ClangdLSPServer::Options take a nullable std::function<void()> to abstract the (optional) platform-specific operation
  • Have ClangdMain contain platform #ifdefs and define+inject the wrapper that calls malloc_trim
  • Optionally: have a clangd or cmake flag to disable this even if glibc was detected. This provides a workaround for other allocators if we need it.

I'll go back to this then, and add all the other changes you suggested


In the meantime, I've done some testing with tcmalloc and jemalloc.

jemalloc does very good in this situation, the RSS goes down as soon as the memory is not required anymore:

  • RSS goes down when the file is done parsing
  • RSS goes down as soon as I close the file in VSCode

tcmalloc does fine, but is not ideal, the RSS does not grow to an unreasonable size, but does not go down when we release resources. It basically plateaus at the maximum usage (meaning what you use when parsing N files, N depending on the parallelism level). I'm not an expert on tcmalloc but from what I found it's supposed to release resources over time, which I failed to observe. tcmalloc does have an extension to actively release memory (MallocExtension::instance()->ReleaseFreeMemory()) which works as expected, bringing the RSS down. That being said, I'm not sure we need to do something about it, clangd will indeed waste a significant amount of RAM in this case, but [1] the usage is still reasonable and bounded and [2] it seems known that tcmalloc is a greedy allocator.

qchateau updated this revision to Diff 313000.Dec 20 2020, 2:43 PM
qchateau marked 16 inline comments as done.
  • Move platform specific code to ClangdMain
  • Generic memory cleanup callback in ClangdLSPServer

Here is the latest version, I believe it should mostly look like what you expect. Let me know if there is anything that you'd like me to change, it's totally fine to have a few more iterations on this diff.

I added a CLANGD_ENABLE_MEMORY_CLEANUP to make sure people can work around some weird linking issues with malloc_trim if necessary. I've noticed options such as CLANGD_ENABLE_REMOTE also appear in the files below, but as I have no idea what they are for, I did not add CLANGD_ENABLE_MEMORY_CLEANUP to them. Let me know if I should do anything concerning these files.

  • clang-tools-extra/clangd/test/lit.site.cfg.py.in
  • llvm/utils/gn/secondary/clang-tools-extra/clangd/BUILD.gn
  • llvm/utils/gn/secondary/clang-tools-extra/clangd/test/BUILD.gn
sammccall accepted this revision.Dec 21 2020, 10:19 AM

Awesome! Some nits on naming and stuff, but the only real substantial question for me is the exact availability/behavior of the command-line flag. Even there, as long as the default behavior is to trim when build with glibc, the details are less important.
Let me know when you want me to land it!

In the meantime, I've done some testing with tcmalloc and jemalloc.
jemalloc does very good in this situation, the RSS goes down as soon as the memory is not required anymore:
tcmalloc does fine, but is not ideal, the RSS does not grow to an unreasonable size, but does not go down when we release resources.

Thanks for this! I agree this is basically tcmalloc working as designed, it tends to sit at the high-water mark.

Let me know if I should do anything concerning these files.

  • clang-tools-extra/clangd/test/lit.site.cfg.py.in

This is only needed if we want to make lit tests run conditionally on whether the CMake variable is enabled. (For various reasons, I think testing this at all is probably more trouble than it's worth).

  • llvm/utils/gn/secondary/clang-tools-extra/clangd/BUILD.gn
  • llvm/utils/gn/secondary/clang-tools-extra/clangd/test/BUILD.gn

These are for the alternate gn build system, which is maintained by its users (commits are not expected to keep it working). It would be reasonable/friendly to blindly add "CLANGD_ENABLE_MEMORY_CLEANUP=1" to both files so that gn always builds in that configuration. FWIW I almost always forget/don't bother, and that's totally acceptable too.

clang-tools-extra/clangd/CMakeLists.txt
17 ↗(On Diff #313000)

In keeping with reducing the abstraction level, maybe we could call this CLANGD_MALLOC_TRIM and note "(only takes affect when using glibc)"?

clang-tools-extra/clangd/tool/ClangdMain.cpp
504 ↗(On Diff #313000)

currently we have the flag or not depending on the cmake flag, and it does something or not depending on whether we're using glibc.

I'd suggest rather we should have the flag if we have glibc (i.e. if it does something), and its *default* should depend on the cmake flag.

504 ↗(On Diff #313000)

It would also be nice to reduce the number of separate #ifdef sections (currently 4).

I'd suggest something like:

#ifdef __GLIBC__
opt<bool> EnableMallocTrim{
...
   init(CLANGD_MALLOC_TRIM),
};
static std::function<void()> MallocTrimFunc() {
  if (!EnableMallocTrim)
    return nullptr;
  return []{ ... };
}
#else
static std::function<void()> MallocTrimFunc() {
  return nullptr;
}
#endif

then later we can assign to opts without an ifdef or other condition

600 ↗(On Diff #313000)

now we don't have the abstraction, we can have our logging back:

if (malloc_trim(MallocTrimPad))
  vlog("Released memory via malloc_trim");
This revision is now accepted and ready to land.Dec 21 2020, 10:19 AM
qchateau updated this revision to Diff 313215.Dec 21 2020, 3:49 PM
qchateau marked 3 inline comments as done.

The log is back, I renamed the CMake option and the command line option and reduced the number of preprocessor directives.
I chose not to modify the files for the gn build system, I'd rather not break it.

I did not use the CMke option as the default value for the command line option: IMO this CMake option is only useful if you encounter build problems related to malloc_trim, so malloc_trim must not be part of the code when you disable it through CMake.

If this all makes sense and I did not make a mistake in this update, you can land this. Let me know if there are other small details you'd like me to change.

Email: quentin.chateau@gmail.com

sammccall accepted this revision.Dec 21 2020, 11:43 PM

I did not use the CMke option as the default value for the command line option: IMO this CMake option is only useful if you encounter build problems related to malloc_trim, so malloc_trim must not be part of the code when you disable it through CMake.

This seems a bit too conservative to me, I wouldn't expect anyone to ever encounter such build problems (e.g. glibc so old that malloc_trim doesn't exist, or header exists but library doesn't), so I'd add a workaround if they actually arose.
The other value of the CMake option is being able to disable the *default* behavior if you use another allocator, so that you don't have to ask all users to pass a custom flag.

I don't feel strongly about this though, in the end it'll be the same thing for ~all users since the option is on by default.

If this all makes sense and I did not make a mistake in this update, you can land this. Let me know if there are other small details you'd like me to change.

Email: quentin.chateau@gmail.com

Thanks so much for working on this, it was both horrifying and a huge relief to see what you'd worked out here!

I'll land now

nridge added a subscriber: nridge.Dec 21 2020, 11:48 PM
This revision was automatically updated to reflect the committed changes.

I feel like we need some kind of entry for this in release notes given how much of an issue it is.

I feel like we need some kind of entry for this in release notes given how much of an issue it is.

Good point. I usually procrastinate this and do a sweep of git log at release time, but while it's top of mind...

I put some notes for this in f7a26127f21fb1ca8252879ca647835ea7c5903d, feel free to suggest/make changes...

It does indeed look like facing the problem from the wrong side (fight the end result instead of finding the root cause) but at that point it does solve an annoying problem. As I investigated this, I found that replacing DenseMap and StringMap wtih std::map mitigates the issue. I think allocating huge chunks of contiguous memory temporarily make this issue worse. Not very useful, but hey, I'm just sharing what I observed. As I previously said, resolving this problem by changing containers or allocators is a lot of work and is not guaranteed to actually solve the issue.

@qchateau Hey, do you happen to remember *which* maps were changed? (No need to go through and repeat the investigation, just if you know).

I'm happy with how we resolved this robustly, but the fact that this is largely triggered by a few specific allocations is an unresolved mystery, and *maybe* there's room for more efficiency (if the malloc_trim is undoing work that need never be done).

My best guess is that hashtables in the index are both large (and thus likely allocated at the top of the arena rather than fitting into a free chunk) and long-lived (and thus block automatic arena shrinkage for a long time). And because the index tends to grow over time, it probably never fits into the old big gap!

In this case I guess there's nothing to do - the index snapshots need lots of memory, and allocating it in a huge chunk is probably optimal, even if we need to "hollow it out" with malloc_trim afterwards. A DenseMap that allows the hashtable to be split over multiple allocations might be interesting, but is obviously a massive project.
But curious if you have thoughts.

I tried to sketch a couple of tetris-based mental models for why these hashtables might never fit inside an existing gap, leading us to stack our arena size (tetris field height) higher and higher.
This is predicated on the assumption that the index hashtables are the biggest single contiguous allocations in clangd, which I think is true. (It also explains why std::map fares better - each node is a separate allocation. Unfortunately, this is also why it's performance is terrible...)
Not sure which/either is model true, but I found it entertaining.

A little follow up, but I've noticed if I have a lot of tabs open in my editor (usually happens as I forget to close them).
When clangd starts (or restarts) , in the first minute, memory usage will shoot right up as it starts to do its thing.
I've regularly seen it go over 15GB before the first trim call is made and it drops back down and settling at ~2GB. For reference clangd typically reports its usage is ~1.5GB when I'm working on LLVM.
This does result in some noticeable hanging of my system, which while not workstation level, its got more horsepower than a chromebook, no offence to anyone who works at Google :).

Maybe we should trim more aggressively in maybe the first couple of minutes of the process, which is typically when most allocations seem to happen. Then fall back to the once a minute.

If you look back at my original comment, you'll notice I originally trimmed the memory after FileSymbols::buildIndex which seemed to be the "end of the peak" memory usage, the goal was to counteract exactly what you describe: a huge memory usage when clangd warms-up and parses a lot of files in parallel.
I wonder if we can find a middle ground. The solution we finally adopted also has the advantage of being less intrusive, which is definitely great considering the nature of the issue