Page MenuHomePhabricator

[clang][SourceManager] cache Macro Expansions
ClosedPublic

Authored by nickdesaulniers on May 27 2020, 5:43 PM.

Details

Summary

A seemingly innocuous Linux kernel change [0] seemingly blew up our
compile times by over 3x, as reported by @nathanchance in [1].

The code in question uses a doubly nested macro containing GNU C
statement expressions that are then passed to typeof(), which is then
used in a very important macro for atomic variable access throughout
most of the kernel. The inner most macro, is passed a GNU C statement
expression. In this case, we have macro arguments that are GNU C
statement expressions, which can contain a significant number of tokens.
The upstream kernel patch caused significant build time regressions for
both Clang and GCC. Since then, some of the nesting has been removed via
@melver, which helps gain back most of the lost compilation time. [2]

Profiles collected [3] from compilations of the slowest TU for us in the
kernel show:

  • 51.4% time spent in clang::TokenLexer::updateLocForMacroArgTokens
  • 48.7% time spent in clang::SourceManager::getFileIDLocal
  • 35.5% time spent in clang::SourceManager::isOffsetInFileID

(mostly calls from the former through to the latter).

So it seems we have a pathological case for which properly tracking the
SourceLocation of macro arguments is significantly harming build
performance. This stands out in referenced flame graph.

In fact, this case was identified previously as being problematic in
commit 3339c568c4 ("[Lex] Speed up updateConsecutiveMacroArgTokens (NFC)")

Looking at the above call chain, there's 3 things we can do to speed up
this case.

  1. TokenLexer::updateConsecutiveMacroArgTokens() calls SourceManager::isWrittenInSameFile() which calls SourceManager::getFileID(), which is both very hot and very expensive to call. SourceManger has a one entry cache, member LastFileIDLookup. If that isn't the FileID for a give source location offset, we fall back to a linear probe, and then to a binary search for the FileID. These fallbacks update the one entry cache, but noticeably they do not for the case of macro expansions!

    For the slowest TU to compile in the Linux kernel, it seems that we miss about 78.67% of the 68 million queries we make to getFileIDLocal that we could have had cache hits for, had we saved the macro expansion source location's FileID in the one entry cache. [4]

    I tried adding a separate cache item for macro expansions, and to check that before the linear then binary search fallbacks, but did not find it faster than simply allowing macro expansions into the one item cache. This alone nets us back a lot of the performance loss.

    That said, this is a modification of caching logic, which is playing with a double edged sword. While it significantly improves the pathological case, its hard to say that there's not an equal but opposite pathological case that isn't regressed by this change. Though non-pathological cases of builds of the Linux kernel before [0] are only slightly improved (<1%) and builds of LLVM itself don't change due to this patch.

    Should future travelers find this change to significantly harm their build times, I encourage them to feel empowered to revert this change.
  1. SourceManager::getFileIDLocal has a FIXME hinting that the call to SourceManager::isOffsetInFileID could be made much faster since isOffsetInFileID is generic in the sense that it tries to handle the more generic case of "local" (as opposed to "loaded") files, though the caller has already determined the file to be local. This patch implements a new method that specialized for use when the caller already knows the file is local, then use that in TokenLexer::updateLocForMacroArgTokens. This should be less controversial than 1, and is likely an across the board win. It's much less significant for the pathological case, but still a measurable win once we have fallen to the final case of binary search. D82497
  1. A bunch of methods in SourceManager take a default argument. SourceManager::getLocalSLocEntry doesn't do anything with this argument, yet many callers of getLocalSLocEntry setup, pass, then check this argument. This is wasted work. D82498

With this patch applied, the above profile [5] for the same pathological
input looks like:

  • 25.1% time spent in clang::TokenLexer::updateLocForMacroArgTokens
  • 17.2% time spent in clang::SourceManager::getFileIDLocal

and clang::SourceManager::isOffsetInFileID is no longer called, and thus
falls out of the profile.

There may be further improvements to the general problem of "what
interval contains one number out of millions" than the current use of a
one item cache, followed by linear probing, followed by binary
searching. We might even be able to do something smarter in
TokenLexer::updateLocForMacroArgTokens.

[0] https://github.com/ClangBuiltLinux/linux/commit/cdd28ad2d8110099e43527e96d059c5639809680
[1] https://github.com/ClangBuiltLinux/linux/issues/1032
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=locking/kcsan&id=a5dead405f6be1fb80555bdcb77c406bf133fdc8
[3] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667
[4] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633741923
[5] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-634932736

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2020, 5:43 PM

Bumping for review. Are there other trusted reviewers we could add the to review to help?

Bumping for review.

I am not the best person to review this, but dig a little bit into your issues and history of this file. Your diagnosis and fix seems reasonable, AFAICT the existing caching behavior was chosen to optimize files with many macro invocations that are expanding into relatively few tokens. So this case will definitely regress after this patch, proportional to number_of_macro_name_tokens/number_of_total_tokens but I would expect that ratio to be quite small in any sane file hence regression to be negligible.

OTOH, when that ratio is low, if number of tokens coming from each/some macro expansions is huge (which IIUC is your case) updating that cache makes a lot of sense.

Also your claim on builds of LLVM itself don't change due to this patch. sounds promising. I've seen quite nice benchmarks for kernel build times in the issues you've mentioned it would be nice if you could back that claim with similar data(not that I don't take your word for it, I just love tables and I am jealous that kernel has some but we don't :( )

Apart from all of this, I believe this patch (even though has relatively few lines) contains 3 distinct patches that should be split:

  • Changing caching behaviour to apply to macro expansions
  • Adding a more optimized way of checking if an offset is inside a local file
  • Getting rid of some "dead" code.

I can say this patch LG from my side, but my understanding of this code isn't quite strong so I would wait for a couple more days to see if someone else will chime in.

clang/include/clang/Basic/SourceManager.h
1747 ↗(On Diff #266711)

i think this deserves its separate patch.

We still can satisfy the interface requirements of libclang by introducing another member getLocalSLocEntryWithInvalid and change the assertion into an if check, populating Invalid in case of failure, what to return is more tricky though it is a const ref, I suppose we can return a dummy entry.

Feel free to keep the newly added calls without an Invalid argument, but I wouldn't modify the already existing ones in this patch.

clang/lib/Basic/SourceManager.cpp
903

we already call getLocalSLocEntry for MiddleIndex above, moreover we even early exit if (SlocOffset < MidOffset) which is the second case in your new function. So I would say we could even gain more by:

const auto &SLocForMiddle = getLocalSLocEntry(MiddleIndex);
auto MidOffset = SLocForMiddle.getOffset();
.
.
.
if (MiddleIndex + 1 ==LocalSLocEntryTable.size() || SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) {
// we've got a hit!
}

That way you also wouldn't need to introduce a new function.

Even though there's value in new helpers, IMO SourceManager is already complicated and has enough variants of each function to shoot yourself in the foot. If you want feel free to put a FIXME saying we can extract a isOffsetInLocalFile helper from these lines if usage is more spread.

Again I would say this is worth doing in a separate patch. As it should be a clear win for all.

971

my comments above for isOffsetnLocalFileID applies to here as well, in case you decide to move them into a separate patch.

Also forgot to say, thanks a lot for taking your time to investigate this issue and coming up with a patch!

I've seen quite nice benchmarks for kernel build times in the issues you've mentioned it would be nice if you could back that claim with similar data(not that I don't take your word for it, I just love tables and I am jealous that kernel has some but we don't :( )

Heh, those were painstakingly written in markdown by hand, using CC="/usr/bin/time -v clang" to test. The weren't statistically significant (N=1), nor were they autogenerated. My claims about builds of LLVM are similarly non-scientific in terms of statistical significance.

I can say this patch LG from my side, but my understanding of this code isn't quite strong so I would wait for a couple more days to see if someone else will chime in.
Also forgot to say, thanks a lot for taking your time to investigate this issue and coming up with a patch!

That's ok, I think generally finding someone to take the time to review pieces of a large project is generally difficult, and that contributors should feel empowered to be able to sign off on modifications to code they themselves did not write. Thank you for taking the time to review it!

I will break this up into 3 patches, and ask you kindly to review once I have them ready. I will post them here, too, in case other reviewers or subscribers would like to follow along, then I'll Abandon this revision.

clang/include/clang/Basic/SourceManager.h
1747 ↗(On Diff #266711)

the interface requirements of libclang

Oh? I must have missed that. Can you tell me more about that interface? I would have thought this would fail to build if I messed up an interface?

clang/lib/Basic/SourceManager.cpp
903

Ah, right! This is a great suggestion! And it looks correct to me.

ychen added a subscriber: ychen.Tue, Jun 16, 3:05 PM

Heh, those were painstakingly written in markdown by hand, using CC="/usr/bin/time -v clang" to test. The weren't statistically significant (N=1), nor were they autogenerated. My claims about builds of LLVM are similarly non-scientific in terms of statistical significance.

Thanks, I believe it would be still good to provide those numbers for LLVM too (even if they are non-scientific), for the sake of future travellers that want to play with caching logic here.

I will break this up into 3 patches, and ask you kindly to review once I have them ready. I will post them here, too, in case other reviewers or subscribers would like to follow along, then I'll Abandon this revision.

thanks a lot, I've provided some details on your libclang interface comment below. but you might want to hold off on dead-code elimination patch, as it is in a quite fragile state and libclang stability is a concern I am not too familiar with.

clang/include/clang/Basic/SourceManager.h
1747 ↗(On Diff #266711)

signature requirement is specified in https://github.com/llvm/llvm-project/blob/master/clang/tools/libclang/CIndexInclusionStack.cpp#L21, but that's actually an implementation file. So one could change the details instead and keep libclang API the same (or as mentioned above, you can just introduce a new SourceManager member that will fill in Invalid)

But the more I looked at this code, it started looking more iffy. It's getLoadedSLocEntry sibling is operating on a vector, asserting in a couple places and as a fallback just creating a new entry at given index via lodadedentriesvector[index] = somethingwithout caring about OOB access as it asserted previously. So this whole Invalid shenanigan looks ... weird.

Satisfying a const ref return type with an optional Invalid flag seems implausible to me :/

I've forked off

  1. D82497 (getFileIDLocal optimizations)
  2. D82498 (don't check Invalid for calls to getLocalSLocEntry)

I agree those should be the less controversial parts of this diff. Once those are landed, I'll rebase this change to just contain the caching logic change.

clang/lib/Basic/SourceManager.cpp
971

This case getFileIDLoaded() did not show up in any traces. It also doesn't have the same FIXME (though that doesn't really matter). As such, I'm more inclined to "let sleeping dogs lie."

nickdesaulniers edited the summary of this revision. (Show Details)Fri, Jun 26, 10:58 AM
nickdesaulniers marked 4 inline comments as done.

Cool, should be good for review. This change is now nice an isolated, in case we need to revert.

kadircet accepted this revision.Fri, Jun 26, 11:53 AM

LGTM!

Thanks for bearing with me on splitting the patches and such! Let's land it and see if this is going to regress any projects.

This revision is now accepted and ready to land.Fri, Jun 26, 11:53 AM
This revision was automatically updated to reflect the committed changes.

I appreciate you taking the time to review, and find additional optimizations! It's not populated yet, but IIUC, this should be the link with more test results: http://llvm-compile-time-tracker.com/compare.php?from=7cc5307c73caa72f22edd4208b175e3c36eec46e&to=dffc1420451f674731cb36799c8ae084104ff0b5&stat=instructions

Again, if future travelers find that this change tickles a pathological case in their builds, please feel empowered to revert. We can always try to thread through the current API that we're expanding tokens of a GNU C Statement Expression, and thus should cache macro expansions. This is just less invasive and I didn't measure any slowdowns locally (only speedups for my pathological case).

clang/lib/Basic/SourceManager.cpp
901

oh, this case here should have been removed, too.

clang/lib/Basic/SourceManager.cpp
901

https://reviews.llvm.org/D82690

Alternatively, I could revert dffc1420451f674, and reland with this hunk? That way if a revert is necessary in the future, it's only one. Also so that before/after metrics are together and not across two separate patches.