This is an archive of the discontinued LLVM Phabricator instance.

Use a custom container to reduce AFDO's memory usage by ~60%
AbandonedPublic

Authored by george.burgess.iv on Dec 12 2017, 6:59 PM.

Details

Summary

This patch introduces a new associative container that's optimized for either being empty or having one element, and makes our sampling profile data structures use it.

As it turns out, AFDO profiles build lots of empty (or almost-empty) maps. :)

The results of three tests I ran with/without the patch:

  • Merging 3.1GB of profile data (split across 226 profiles) consumed 10.2GB of RAM and took 42 seconds to terminate with this patch. Without, it took 26.3GB of RAM and took 70 seconds.
  • Given a 32MB profile in $foo, running echo | clang -O0 -x c - -fprofile-sample-use=$foo took 250MB of RAM with this patch, and 650MB without. Clang itself ate 40MB without -fprofile-sample-use being passed.
  • Given a 50MB profile collected by someone else, I observed a drop from 1.6GB of RAM consumption to 569MB with this patch.

Aside: I'm unsure of what the requirements are to land a new data structure in ADT. If this seems too tailored to AFDO's uses or too incomplete (e.g. it lacks insert, remove, and emplace; it doesn't support a custom less-than op, ...) on its own, I'm happy to move it elsewhere or add whatever's missing.

Diff Detail

Event Timeline

include/llvm/ADT/TinyAssociativeVector.h
39

Visibility boost for this bit ^

Is it enough to simply replace StringMap with std::map<std::string, ..> for small maps? Do you have memory data with that?

include/llvm/ProfileData/SampleProf.h
188–191

I wonder how much is saved by replacing just std::map with TinyAssociativeVector.

426

Can you split out this clean up change into a separate patch?

george.burgess.iv marked 2 inline comments as done.

Addressed feedback; thanks!

Also, I made FunctionSamplesMap's key a std::string, so this is as close to NFC as possible. Doing so is a bit pricey, but we're still left with a 50-55% memory reduction. If we want, we can swap to StringRef in its own CL.

Is it enough to simply replace StringMap with std::map<std::string, ..> for small maps? Do you have memory data with that?

Please see the response to your inline comment. The short answer is "that is a good win, but more on the order of 12%." I don't know if that's enough, but can check if you'd like.

include/llvm/ProfileData/SampleProf.h
188–191

For the 1.5GB profile case:

  • Only changing BodySampleMap and CallsiteSampleMap to TinyAssociativeVectors saved us 20% (1.603GB -> 1.288GB)
  • Only changing FunctionSamplesMap and CallTargetMap to TinyAssociativeVectors (FunctionSamplesMap using StringRefs) saved us 45% (1.603GB -> 884MB)
  • Only changing FunctionSamplesMap and CallTargetMap to std::maps (FunctionSamplesMap using StringRefs) saved us 22% (1.603GB -> 1.219GB)

Changing an individual StringMap -> std::map or StringMap/std::map to TinyAssociativeVector is always a win in memory usage, in my tests.

The majority of the benefit is from FunctionSamplesMap: changing that alone to a TinyAssociativeVector drops memory usage by 34% (1.603GB -> 1.025GB). Of that, a reasonable portion of the benefit is from not copying strings around: using a TinySortedMapVector<std::string, FunctionSamples> for FunctionSamplesMap lands us at 1.159GB of memory use, which is a 26% reduction over a StringMap.

If there are any other combinations you'd like to see, please let me know. I have an earlier version of this that makes toggling these (and any required usage-related changes) really easy. :)

426

Done. Will submit as a follow-up to this one

I suggest making this a multiple stage change.

patch-1: change expensive map type to std::map to materialize memory savings quickly -- with the data you collected, it is quite safe to check it in
patch-2: TinyMap implementation patch
patch-3: Use TinyMap in samplePGO -- with incremental savings data collected.

Patch-2 needs more closer look by someone like Chandler. Data from patch-3 or perhaps some other potential users can help justify patch-2.

george.burgess.iv abandoned this revision.Dec 14 2017, 3:34 PM

I like that idea. :)

Also, it looks like I was mistaken: CallTargetMap is better off a StringMap than a std::map<std::string, uint64_t>. The difference was around 5%, which makes step one save us 17% of memory. Checked it in as r320764, as requested.

I'll do step 2 and step 3 in fresh, more targeted reviews.

Thanks again!