This is an archive of the discontinued LLVM Phabricator instance.

[clang][Lexer] Speedup HeaderSearch when there are many HeaderMaps
ClosedPublic

Authored by troyj on Oct 12 2022, 11:23 AM.

Details

Summary

HeaderSearch already uses a caching system to avoid duplicate searches, but the initial cold miss can take a long time if a build system has supplied thousands of HeaderMaps. For this case, the SearchDirs vector begins with those HeaderMaps, so a cache miss requires testing if the sought filename is present in each of those maps. Instead, we can consolidate the keys of those HeaderMaps into one StringMap and then each cache miss can skip directly to the correct HeaderMap or continue its search beyond the initial sequence of HeaderMaps. In testing on TUs with ~15000 SearchDirs where the initial 99% are HeaderMaps, time spent in Clang was reduced by 15%. This patch is expected to be neutral for SearchDir vectors that do not begin with HeaderMaps.

Diff Detail

Event Timeline

troyj created this revision.Oct 12 2022, 11:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 11:23 AM
troyj requested review of this revision.Oct 12 2022, 11:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 11:23 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
bruno added a comment.Oct 12 2022, 9:28 PM

Thanks for working on this Troy, nice speed up.

A bit more context for reviewers: we're probably unique on how we use hmaps, we have tons of them in a single compiler invocation. The other users of hmaps I've seen don't use more than a handful. This means that general users shouldn't expect much disruption here.

clang/include/clang/Lex/HeaderMap.h
52

Please remove the curly braces for this if and similar ones in the rest of the patch (if present).

clang/include/clang/Lex/HeaderSearch.h
255

Did you end up getting data on the memory footprint of this map? It seems to me that it should be fine if header maps aren't much populated.

clang/lib/Lex/HeaderSearch.cpp
391

Remove this empty line.

392

What happens if different header maps contain the same header name?

1034

Looks like we could have only one call to CacheLookup.reset instead?

troyj added inline comments.Oct 13 2022, 8:35 AM
clang/include/clang/Lex/HeaderSearch.h
255

Right, with ~15000 header maps, this map has ~92000 entries, so each map contributes about 6 unique keys. The patch does not increase the peak memory usage of Clang.

clang/lib/Lex/HeaderSearch.cpp
392

The first one sticks because try_emplace doesn't insert if the key already exists. I'll add a comment.

1034

The reset call has a side effect of setting CacheLookup.MappedName to nullptr, so we can't call it unconditionally because the Hit case should not call it. The unpatched code combined the !SkipCache check and the Hit check, so it still called reset when SkipCache == true. I preserved that behavior because I think it's necessary. SkipCache seems to imply "don't consult the cache for this call" instead of "don't cache anything" so it still caches the result for a future LookupFile call that passes SkipCache == false.

troyj updated this revision to Diff 467496.Oct 13 2022, 8:38 AM

Addressed Bruno's comments.

troyj marked 5 inline comments as done.Oct 13 2022, 8:39 AM
bruno accepted this revision.Oct 17 2022, 11:18 PM

Thanks for the detailed explanations, LGTM!

This revision is now accepted and ready to land.Oct 17 2022, 11:18 PM