This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] GUIDToFuncNameMap can be static
Needs ReviewPublic

Authored by huangjd on Jun 21 2023, 5:45 PM.

Details

Summary

FunctionSamples::GUIDToFuncNameMap can be static instead of per
function sample. It is only used by SampleProfileLoader on a per module
basis and there's never a case (and we probably don't want such case)
that different FunctionSamples have differnet GUIDToFuncNameMap because it's
just reverse MD5 lookup!

This reduce the size of a FunctionSamples object, which has been a
bottleneck on profile loading.

Diff Detail

Event Timeline

huangjd created this revision.Jun 21 2023, 5:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2023, 5:45 PM
huangjd requested review of this revision.Jun 21 2023, 5:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2023, 5:45 PM
hoy added inline comments.Jun 21 2023, 5:52 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
363

A static GUIDToFuncNameMap is shared across threads but here it's made to point to a module-local DS. This could be problematic. Also it cause race condition.

huangjd updated this revision to Diff 533818.Jun 22 2023, 4:55 PM

Use thread_local

huangjd marked an inline comment as done.Jun 22 2023, 4:56 PM
huangjd added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
363

Changed it to thread_local

hoy added inline comments.Jun 23 2023, 9:46 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
362

nit: please update the comment. The update is per-module now.

Also it might be better to just move this out of the constructor into SampleProfileLoader::runOnModule. The container CurrentGUIDToFuncNameMap here is basically a ref to GUIDToFuncNameMap of SampleProfileLoader.

372

Please update the comment and mov it out to SampleProfileLoader::runOnModule too.

V1 of this patch reverts things back to before D65848, and D65848 was added to fix corruption from race condition.

The current version of this patch is similar to V1 of D65848, but we removed TLS because of this comment on LLVM_THREAD_LOCAL in include/llvm/Support/Compiler.h:

/// This is essentially an extremely restricted analog to C++11's thread_local
/// support. It uses thread_local if available, falling back on gcc __thread
/// if not. __thread doesn't support many of the C++11 thread_local's
/// features. You should only use this for PODs that you can statically
/// initialize to some constant value. In almost all circumstances this is most
/// appropriate for use with a pointer, integer, or small aggregation of
/// pointers and integers.

We should consider having singleton instance and use LLVM's synchronization support to guard updating. Assuming updating the map is rare so the contention should have low runtime impact.

We should consider having singleton instance and use LLVM's synchronization support to guard updating. Assuming updating the map is rare so the contention should have low runtime impact.

Essentially this is per-module map that needs to be accessed from FunctionSamples. For local (non-distributed) ThinLTO, each thin-backend thread will build part of the map (for symbols in that module) and access that part of the map, so write is not uncommon and contention could be common. I think TLS is the best option if it's available to use.

OTOH, what's the total size of FunctionSamples, and how significant is saving 8 byte in it?

hoy added a comment.Jun 23 2023, 5:15 PM

We should consider having singleton instance and use LLVM's synchronization support to guard updating. Assuming updating the map is rare so the contention should have low runtime impact.

Essentially this is per-module map that needs to be accessed from FunctionSamples. For local (non-distributed) ThinLTO, each thin-backend thread will build part of the map (for symbols in that module) and access that part of the map, so write is not uncommon and contention could be common. I think TLS is the best option if it's available to use.

OTOH, what's the total size of FunctionSamples, and how significant is saving 8 byte in it?

TLS is now available since LLVM_ENABLE_THREADS is on by default, but on some platforms it's still off. A safe way might be to condition the current change under LLVM_ENABLE_THREADS.

A safe way might be to condition the current change under LLVM_ENABLE_THREADS.

I don't think it's a good idea to sprinkle LLVM_ENABLE_THREADS in a bunch of places just for this.

We should consider having singleton instance and use LLVM's synchronization support to guard updating. Assuming updating the map is rare so the contention should have low runtime impact.

Essentially this is per-module map that needs to be accessed from FunctionSamples. For local (non-distributed) ThinLTO, each thin-backend thread will build part of the map (for symbols in that module) and access that part of the map, so write is not uncommon and contention could be common. I think TLS is the best option if it's available to use.

Is this map initialized once and used mostly readonly? if that is the case, serializing initialization from all threads (in the order of tens, not hundreds) is no different from one static initialization done globally. Anyway, the effect needs to be measured to verify.

OTOH, what's the total size of FunctionSamples, and how significant is saving 8 byte in it?

We should consider having singleton instance and use LLVM's synchronization support to guard updating. Assuming updating the map is rare so the contention should have low runtime impact.

Essentially this is per-module map that needs to be accessed from FunctionSamples. For local (non-distributed) ThinLTO, each thin-backend thread will build part of the map (for symbols in that module) and access that part of the map, so write is not uncommon and contention could be common. I think TLS is the best option if it's available to use.

Is this map initialized once and used mostly readonly? if that is the case, serializing initialization from all threads (in the order of tens, not hundreds) is no different from one static initialization done globally. Anyway, the effect needs to be measured to verify.

Nope. The map will be written to every time a thread picks up a new module to run thin-backend. When processing a new module, name mapping for functions in that module will need to be written to the map. So we can't really front load all writes for each thread, as this is like a per-module map.

OTOH, what's the total size of FunctionSamples, and how significant is saving 8 byte in it?

Sorry for delayed response. This fell through the cracks..