Otherwise there are cache misses just from changing the name of a path, even though the input modules did not change.
rdar://109672225
Differential D151165
[ThinLTO] Make the cache key independent of the module identifier paths akyrtzi on May 22 2023, 3:20 PM. Authored by
Details Otherwise there are cache misses just from changing the name of a path, even though the input modules did not change. rdar://109672225
Diff Detail
Unit Tests
Event TimelineComment Actions Use Index.getModule() to get both ModuleId and the hash, so that we do only one lookup for both. Comment Actions @tejohnson Can you please confirm the correctness of this change? This change has caused major compile-time regressions in Rust (compiled modules no longer get reused in incremental builds), because modules get added to the index in different orders across compilations. On the surface, it seems like we could easily work around this by adding a sort over the module key. However, the modules involved in the compilation can actually change, e.g. if references to symbols in a module are added/removed so that the module does not need to be linked at all (or starts needing to be linked). Even with a sort, this is going to shift the module indices, and result in at least unnecessary cache invalidation (I'm not sure it can result in result in incorrect cache reuse). The basic premise of this patch (that we can use the module order instead of the module key) seems to be premised on a specific compilation model that is not valid for all thin lto consumers. Comment Actions What about changing the code to sort using the module hash: llvm::sort(ImportModulesVector, [](const ImportModule &Lhs, const ImportModule &Rhs) -> bool { - return Lhs.getId() < Rhs.getId(); + return Lhs.getHash() < Rhs.getHash(); }); would that resolve your issue? Comment Actions Thanks for the report, yeah I can see how using the module id here can result in spurious differences. Using the module hash should address the issue I think. This is better, @nikic can you confirm? The module ID is an older concept that predates the module hash. We should probably remove that from the in-memory index completely, since it just takes up space and can lead to confusion about what values are stable and should be used. The numeric id is utilized in the Bitcode format for compactness, but I don't think we need it in memory anymore (quick scan of the codebase suggests not). Let me see if I can remove that once this issue is fixed. Comment Actions I've just confirmed that using the hash works for the original, unreduced test cases as well. |