See PR 45819 for additional information (such as how to reproduce the bug, problem analysis etc).
https://bugs.llvm.org/show_bug.cgi?id=45819
We noticed that when building an executable with ThinLTO (cache enabled), from time to time unexpected cache misses happening and new cache entries are generated when not needed. It doesn't happen in a predictable manner (i.e. a cache miss might or might not happen).
‘ExportList’(‘DenseSet’ of ValueInfo's) is used for the calculation of a hash value (LTO Cache Key).
Though the elements of this DenseSet are the same all the time, the order in which the iterator walks through the elements of this set might (or might not) be different the next time when we relink with ThinLTO. If the order happens to be different, we will generate a different hash value and a cache miss will happen.
Looking at the implementation of:
template <> struct DenseMapInfo<ValueInfo> (see ModuleSummaryIndex.h)
we notice that the hash value that is being used by a DenseMap is a pointer:
static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }
We cannot guarantee that the compiler will allocate memory at the same
address for the object when we run the executable for the second time.
The same problem is applicable to ImportList as well.
A proposed solution here is:
(a) Create a vector of Export List's GUID (int64 integers) and sort this vector. Iterate through this sorted vector and use its elements for cache entry value calculations.
(b) Create a vector of ImportList's ModuleIDs (StringRefs) and sort this vector in lexicographical order. Iterate through this sorted vector. Use ModuleID as a key for obtaining an iterator to corresponding element of the ImportList, which has StringMap type. Use ModuleID (first element of the pair) and the list of GUIDs of all functions to import for this module (second element of the pair) for cache entry value calculations.
Note: it looks like I just spotted another potential problem in here. If confirmed, I will do a separate fix for it.
We use unordered_set container for storing GUIDs of all the functions to import for a source module.
using FunctionsToImportTy = std::unordered_set<GlobalValue::GUID>;
That means that in theory, when when we iterate through unordered_set for the second time, we could get a different order of the elements in this set, computing a different hash value and causing a potential cache miss.
Typo: "elemets"