This patch uses the lower 64-bits of the MD5 hash of a function name as
a GUID in the function index, instead of storing function names. Any
local functions are first given a global name by prepending the original
source file name. This is the same naming scheme and GUID used by PGO in
the indexed profile format.
This change has a couple of benefits. The primary benefit is size
reduction in the combined index file, for example 483.xalancbmk's
combined index file was reduced by around 70%. It should also result in
memory savings for the index file in memory, as the in-memory map is
also indexed by the hash instead of the string.
Second, this enables integration with indirect call promotion, since the
indirect call profile targets are recorded using the same global naming
convention and hash. This will enable the function importer to easily
locate function summaries for indirect call profile targets to enable
their import and subsequent promotion.
The original source file name is recorded in the bitcode in a new
module-level record for use in the ThinLTO backend pipeline.
I am a little concerned with using DenseMap data structure here.
Both DenseMap and StringMap are implemented as open hashtab with quadratic probing. The load factor of both are guaranteed to < 0.75 -- that means there are lots of empty buckets in a large table. The main differences are:
There is another bad side effect is that if elements are inserted into map one by one, it will incur lots of reallocation operation (just like vector) unless the size of the map is known before hand and properly resized at the beginning.
I suggest using std::map if the size of the map is not known priori. If it is known, and if the map lookup happens after the map is created, it might be better to use a vector (pushback) followed by a sort after the map is populated.