This revisions implements sharding in the storage of parametric instances to decrease lock contention by sharding out the allocator/mutex/etc. to use for a specific storage instance based on the hash key. This is a somewhat common approach to reducing lock contention on data structures, and is used by the concurrent hashmaps provided by folly/java/etc. For several compilations tested, this removed all/most lock contention from profiles and reduced compile time by several seconds.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Right now I've set it so that all types of storage instances use 8 shards, but that isn't that great. Some storage instances have higher concurrency needs, e.g. DictionaryAttr, while others have close to none, e.g. IntegerType. It might be a good idea to expose the level of concurrency expected by specific storage instances, but left that out from the initial patch. Thoughts on this?
mlir/lib/Support/StorageUniquer.cpp | ||
---|---|---|
158 | This requires numShards to be a power of 2, worth documenting somewhere + adding an assert. Alternatively, round to the closest power of two before this. |
mlir/lib/Support/StorageUniquer.cpp | ||
---|---|---|
91 | Do we need to guard these on when threading is enabled? E.g., have the non threaded case be completely non-sharded and so not even create an array? | |
132 | I don't follow this, why do we not care about which shard for arbitrary mutation? Also does this need to be public? |
Resolve comments
mlir/lib/Support/StorageUniquer.cpp | ||
---|---|---|
91 | Updated to wrap all of the complicated bits within an if LLVM_ENABLE_THREADS so that we don't incur any complexity when threading is disabled at build time. | |
132 |
Updated to find the shard that allocated the storage instance.
Yes, but it is fine because everything here is local to the StorageUniquer.cpp file. We could make it private, but only by moving it to be nested within the StorageUniquerImpl class. | |
158 | Added an assert for now given that this is still local to this file. If we expose concurrency to users, I suspect we wouldn't expose exact shard numbers but have something akin to concurrency levels. |
Do we need to guard these on when threading is enabled? E.g., have the non threaded case be completely non-sharded and so not even create an array?