This is an archive of the discontinued LLVM Phabricator instance.

[DWARFLinkerParallel] Add StringPool class.
ClosedPublic

Authored by avl on Jan 2 2023, 4:22 AM.

Details

Summary

This patch is extracted from D96035. It adds StringPool class.
StringPool allows to store strings in parallel. It also allows
to have string data associated with the concrete string.

Diff Detail

Event Timeline

avl created this revision.Jan 2 2023, 4:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 4:22 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
avl requested review of this revision.Jan 2 2023, 4:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 4:22 AM

We probably already have a few StringPools kicking around in LLVM - a multithreaded one sounds like something LLDB already uses (it has a core StringPool for interning strings) - any chance the LLDB one could be sunk into LLVM and reused for your use case as well, for instance?

avl added a comment.Jan 3 2023, 3:58 PM

We probably already have a few StringPools kicking around in LLVM - a multithreaded one sounds like something LLDB already uses (it has a core StringPool for interning strings) - any chance the LLDB one could be sunk into LLVM and reused for your use case as well, for instance?

I am not sure that found the right piece of lldb code:

"lldb/Core/UniqueCStringMap.h" - this one is not multi-thread.
"lldb/Core/ThreadSafeDenseMap.h" - this one looks to be not effective in heavy multi-thread environment. It locks the whole table for every operation. In heavy multi-thread environment threads would wait each another.

Thus, it looks like both of them are not good candidate.

We probably already have a few StringPools kicking around in LLVM - a multithreaded one sounds like something LLDB already uses (it has a core StringPool for interning strings) - any chance the LLDB one could be sunk into LLVM and reused for your use case as well, for instance?

I am not sure that found the right piece of lldb code:

"lldb/Core/UniqueCStringMap.h" - this one is not multi-thread.
"lldb/Core/ThreadSafeDenseMap.h" - this one looks to be not effective in heavy multi-thread environment. It locks the whole table for every operation. In heavy multi-thread environment threads would wait each another.

Thus, it looks like both of them are not good candidate.

I think the code David is talking about is in ConstString.cpp. LLDB uniques strings so that it can compare strings by comparing pointers in a unique string pool. This string pool is thread safe and goes to great length to reduce any lock contention. You might be able to loot some code from there if needed.

This approach seems to avoid that contention by creating thread local string pools? Those would then need to be merged later?

avl added a comment.Jan 4 2023, 10:12 AM

I think the code David is talking about is in ConstString.cpp.

thank you for pointing. somehow I missed it.

LLDB uniques strings so that it can compare strings by comparing pointers in a unique string pool.

this patch(D140841) does the same.

This string pool is thread safe and goes to great length to reduce any lock contention. You might be able to loot some code from there if needed.

Generally speaking two solutions are similar with following differences:

lldb pool uses array of single thread hash-maps while D140841 uses single multi-thread hashmap.

lldb pool uses array of allocators while D140841 uses thread local allocators.

Using multi-thread hashmap, in theory, might have the advantage of better performance in multi-thread mode.
I will compare numbers for both implementations and post here.

Also multi-thread hashmap from D132455 is more general in that sense that it might be used for non strings.
f.e. dwarf Type pool could be implemented the same way. While solution based on StringMap might be used only for strings.

If thread locals allocators are not good then similar solution(array of allocators) might be used for D140841.
Then this solution would look pretty match as lldb pool.

This approach seems to avoid that contention by creating thread local string pools? Those would then need to be merged later?

No, they should not be merged. Threads created by ThreadPool are live until destructor of ThreadPool is called. Thus all thread local data are available and might be referenced. StringPool contains pointers to the thread local data which should be valid until the destructor of ThreadPool is called.

JDevlieghere added a comment.EditedJan 4 2023, 10:54 AM

I'm definitely curious to hear how the two compare in terms of performance. I suppose it shouldn't be too hard to build ConstString on top of this if it turns out to be faster.

avl added a comment.Jan 9 2023, 3:01 AM

I'm definitely curious to hear how the two compare in terms of performance. I suppose it shouldn't be too hard to build ConstString on top of this if it turns out to be faster.

@JDevlieghere here is the result of performance comparison. All runs insert 100000000 strings produced from corresponding integer. Also ConcurrentHashTable pool measured with setting initial size. lldb pool does not have this feature.

-------------------------------------------------------------------
              |      lldb-pool      |   ConcurrentHashTable pool  |
-------------------------------------------------------------------
              |                     |                             |
  threads 1   | time:     22sec     | time:       19.7sec         |
              | mem:       5G       |  mem:        6.7G           |
              |                     |                             |
-------------------------------------------------------------------
              |                     |                             |
  threads 16  | time:     4.2sec    | time:        3.5sec         |
              | mem:       5G       |  mem:        6.7G           |
              |                     |                             |
-------------------------------------------------------------------
              |                     |                             |
 initial-size |                     |                             |
   100000000  |                     | time:       10.5sec         |
              |                     |  mem:        6.0G           |
  threads 1   |                     |                             |
              |                     |                             |
-------------------------------------------------------------------
              |                     |                             |
 initial-size |                     |                             |
   100000000  |                     | time:        2.1sec         |
              |                     |  mem:        6.0G           |
  threads 16  |                     |                             |
              |                     |                             |
-------------------------------------------------------------------

ConcurrentHashTable works faster, though it requires more memory.

avl added a comment.Jan 15 2023, 12:41 PM

There were implemented space optimizations for D132455. With these optimizations performance comparison looks like this:

Ubuntu
Processor Name: Intel(R) Core(TM) i9-9900
Processor Speed: 3.10GHz
Number of Processors: 1
Total Number of Cores: 8
Memory: 64 GB

-------------------------------------------------------------------------
                     |      lldb-pool      |  ConcurrentHashTable pool  |
-------------------------------------------------------------------------
  threads 1          | time:     22.9sec   | time:       17.8sec        |
                     | mem:       5.0G     |  mem:        5.56G         |
-------------------------------------------------------------------------
  threads  8         | time:      5.4sec   | time:        4.5sec        |
                     | mem:       5.0G     |  mem:        5.57G         |
-------------------------------------------------------------------------
  threads 16         | time:      4.2sec   | time:        3.3sec        |
                     | mem:       5.0G     |  mem:        5.56G         |
-------------------------------------------------------------------------
 init-size 100000000 |                     | time:       11.3sec        |
  threads  1         |                     |  mem:        5.2G          |
-------------------------------------------------------------------------
 init-size 100000000 |                     | time:        2.8sec        |
  threads  8         |                     |  mem:        5.2G          |
-------------------------------------------------------------------------
 init-size 100000000 |                     | time:        2.5sec        |
  threads 16         |                     |  mem:        5.2G          |
-------------------------------------------------------------------------

Darwin
Processor Name: 12-Core Intel Xeon E5
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 12
Memory: 64 GB

-------------------------------------------------------------------------
                     |      lldb-pool      |  ConcurrentHashTable pool  |
-------------------------------------------------------------------------
  threads 1          | time:     33.4sec   | time:       26.2sec        |
                     | mem:       5.3G     |  mem:        5.7G          |
-------------------------------------------------------------------------
  threads  8         | time:     11.2sec   | time:        4.9sec        |
                     | mem:       5.4G     |  mem:        5.7G          |
-------------------------------------------------------------------------
  threads 24         | time:     29.8sec   | time:        2.3sec        |
                     | mem:       5.5G     |  mem:        5.7G          |
-------------------------------------------------------------------------
 init-size 100000000 |                     | time:       18.8sec        |
  threads  1         |                     |  mem:        5.3G          |
-------------------------------------------------------------------------
 init-size 100000000 |                     | time:        3.3sec        |
  threads  8         |                     |  mem:        5.3G          |
-------------------------------------------------------------------------
 init-size 100000000 |                     | time:        1.8sec        |
  threads 24         |                     |  mem:        5.3G          |
-------------------------------------------------------------------------
avl updated this revision to Diff 489577.Jan 16 2023, 8:54 AM

use djb hash function. rebased.

Looks fine to me as is seems performant. I will let owners of the DWARF linker approve for good.

Anyone else already planning to try to move lldb over to use this? The numbers look pretty promising.

JDevlieghere accepted this revision.Jan 17 2023, 11:08 AM

LGTM for the DwarfLinker.

Alexey, how hard would it be to port over the ConstString implementation to use the ConcurrentHashTable?

This revision is now accepted and ready to land.Jan 17 2023, 11:08 AM
avl added a comment.EditedJan 18 2023, 3:54 AM

LGTM for the DwarfLinker.

Alexey, how hard would it be to port over the ConstString implementation to use the ConcurrentHashTable?

I think it should not be hard. Thing which I am not sure at the moment is - would it be OK to have thread local allocators for lldb. Anyway, I will prepare a patch and let`s see.

avl added a comment.Jan 25 2023, 7:19 AM

LGTM for the DwarfLinker.

Alexey, how hard would it be to port over the ConstString implementation to use the ConcurrentHashTable?

I think it should not be hard. Thing which I am not sure at the moment is - would it be OK to have thread local allocators for lldb. Anyway, I will prepare a patch and let`s see.

So far, the answer is that port over the ConstString implementation to use the ConcurrentHashTable requires some work. The main thing is that ConcurrentHashTable assumes that it uses thread safe memory pool. For the use case from D96035 there may be used thread local BumpPtrAllocator or allocator from D142318(as it uses Thread Pools to create threads). These solutions could not be used for lldb, as it creates threads not only using ThreadPools. So to port over the ConstString implementation to use the ConcurrentHashTable it is necessary to implement thread safe BumpPtrAllocator not relying on ThreadPools.

avl updated this revision to Diff 509377.Mar 29 2023, 8:21 AM

rebased.

avl retitled this revision from [DWARFLinkerNext] Add StringPool class. to [DWARFLinkerParallel] Add StringPool class..Mar 29 2023, 8:23 AM
avl edited the summary of this revision. (Show Details)
avl updated this revision to Diff 509456.Mar 29 2023, 1:01 PM

rebased.

JDevlieghere accepted this revision.Mar 30 2023, 10:15 AM
avl updated this revision to Diff 509765.Mar 30 2023, 12:01 PM

rebased.

This revision was automatically updated to reflect the committed changes.