The ModuleManager's use of FileEntry nodes as the keys for its map of
loaded modules is less than ideal. Uniqueness for FileEntry nodes is
maintained by FileManager, which in turn uses inode numbers on hosts
that support that. When coupled with the module cache's proclivity for
turning over and deleting stale PCMs, this means entries for different
module files can wind up reusing the same underlying inode. When this
happens, subsequent accesses to the Modules map will disagree on the
ModuleFile associated with a given file.
It's fine to use the file management utilities to guarantee the presence
of module data, but we need a better source of key material that is
invariant with respect to these OS-level details. Using file paths alone
is a similarly frought solution because the ASTWriter performs a custom
canonicalization step (that is not equivalent to path canonicalization)
that renders keys from loaded AST cores useless for looking up cached
entries.
To mitigate the effects of inode reuse, increase the entropy of the key
material by incorporating the modtime and size. This ultimately
decreases the likelihood that a PCM that is swapped on disk will confuse
the cache, but it does not eliminate the possibility of collisions.
rdar://48443680
Can you add a doxygen comment explaining why we compute our own hashing as opposed to using the FileEntry pointer?