In one of our links lld was reading 760k files, but the unique number of
files was only 1500. This takes that link from 30 seconds to 8.
This seems like a heavy hammer, especially since some things don't need
to be cached, like the filelist arguments and the passed static
archives (the latter is already cached as a one off), but it seems ld64
does something similar here to short circuit these duplicate reads:
Of the types of files being read for our iOS app, the biggest problem
was constantly re-reading small tbd files:
% wc -l /tmp/read.txt 761414 /tmp/read.txt % cat /tmp/read.txt | sort -u | wc -l 1503 % cat /tmp/read.txt | grep "\.a$" | wc -l 43721 % cat /tmp/read.txt | grep "\.tbd$" | wc -l 717656
We could likely hoist this logic up to not cache at this level, but it
would be a more invasive change to make sure all callers that needed it
cached the results.
I could see this being an issue with OOMs, and I'm not a linker expert so
maybe there's another way we should solve this problem? Feedback welcome!
Would be nice to have a comment here about how this is primarily for the caching of .tbd / common library files and is a bit of a heavy hammer (as mentioned in the commit message)