HeaderSearch already uses a caching system to avoid duplicate searches, but the initial cold miss can take a long time if a build system has supplied thousands of HeaderMaps. For this case, the SearchDirs vector begins with those HeaderMaps, so a cache miss requires testing if the sought filename is present in each of those maps. Instead, we can consolidate the keys of those HeaderMaps into one StringMap and then each cache miss can skip directly to the correct HeaderMap or continue its search beyond the initial sequence of HeaderMaps. In testing on TUs with ~15000 SearchDirs where the initial 99% are HeaderMaps, time spent in Clang was reduced by 15%. This patch is expected to be neutral for SearchDir vectors that do not begin with HeaderMaps.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for working on this Troy, nice speed up.
A bit more context for reviewers: we're probably unique on how we use hmaps, we have tons of them in a single compiler invocation. The other users of hmaps I've seen don't use more than a handful. This means that general users shouldn't expect much disruption here.
clang/include/clang/Lex/HeaderMap.h | ||
---|---|---|
52 | Please remove the curly braces for this if and similar ones in the rest of the patch (if present). | |
clang/include/clang/Lex/HeaderSearch.h | ||
255 | Did you end up getting data on the memory footprint of this map? It seems to me that it should be fine if header maps aren't much populated. | |
clang/lib/Lex/HeaderSearch.cpp | ||
391 | Remove this empty line. | |
392 | What happens if different header maps contain the same header name? | |
1035 | Looks like we could have only one call to CacheLookup.reset instead? |
clang/include/clang/Lex/HeaderSearch.h | ||
---|---|---|
255 | Right, with ~15000 header maps, this map has ~92000 entries, so each map contributes about 6 unique keys. The patch does not increase the peak memory usage of Clang. | |
clang/lib/Lex/HeaderSearch.cpp | ||
392 | The first one sticks because try_emplace doesn't insert if the key already exists. I'll add a comment. | |
1035 | The reset call has a side effect of setting CacheLookup.MappedName to nullptr, so we can't call it unconditionally because the Hit case should not call it. The unpatched code combined the !SkipCache check and the Hit check, so it still called reset when SkipCache == true. I preserved that behavior because I think it's necessary. SkipCache seems to imply "don't consult the cache for this call" instead of "don't cache anything" so it still caches the result for a future LookupFile call that passes SkipCache == false. |
Please remove the curly braces for this if and similar ones in the rest of the patch (if present).