This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Basic/
-
lib/
-
Basic/
4/6
SourceManager.cpp

Differential D80681

[clang][SourceManager] cache Macro Expansions
ClosedPublic

Authored by nickdesaulniers on May 27 2020, 5:43 PM.

Download Raw Diff

Details

Reviewers

rsmith
bkramer
sammccall
kadircet

Commits

rG7b8cf98b4a9a: Reland "[clang][SourceManager] cache Macro Expansions""
rGdffc1420451f: [clang][SourceManager] cache Macro Expansions

Summary

A seemingly innocuous Linux kernel change [0] seemingly blew up our
compile times by over 3x, as reported by @nathanchance in [1].

The code in question uses a doubly nested macro containing GNU C
statement expressions that are then passed to typeof(), which is then
used in a very important macro for atomic variable access throughout
most of the kernel. The inner most macro, is passed a GNU C statement
expression. In this case, we have macro arguments that are GNU C
statement expressions, which can contain a significant number of tokens.
The upstream kernel patch caused significant build time regressions for
both Clang and GCC. Since then, some of the nesting has been removed via
@melver, which helps gain back most of the lost compilation time. [2]

Profiles collected [3] from compilations of the slowest TU for us in the
kernel show:

51.4% time spent in clang::TokenLexer::updateLocForMacroArgTokens
48.7% time spent in clang::SourceManager::getFileIDLocal
35.5% time spent in clang::SourceManager::isOffsetInFileID

(mostly calls from the former through to the latter).

So it seems we have a pathological case for which properly tracking the
SourceLocation of macro arguments is significantly harming build
performance. This stands out in referenced flame graph.

In fact, this case was identified previously as being problematic in
commit 3339c568c4 ("[Lex] Speed up updateConsecutiveMacroArgTokens (NFC)")

Looking at the above call chain, there's 3 things we can do to speed up
this case.

TokenLexer::updateConsecutiveMacroArgTokens() calls SourceManager::isWrittenInSameFile() which calls SourceManager::getFileID(), which is both very hot and very expensive to call. SourceManger has a one entry cache, member LastFileIDLookup. If that isn't the FileID for a give source location offset, we fall back to a linear probe, and then to a binary search for the FileID. These fallbacks update the one entry cache, but noticeably they do not for the case of macro expansions!

For the slowest TU to compile in the Linux kernel, it seems that we miss about 78.67% of the 68 million queries we make to getFileIDLocal that we could have had cache hits for, had we saved the macro expansion source location's FileID in the one entry cache. [4]

I tried adding a separate cache item for macro expansions, and to check that before the linear then binary search fallbacks, but did not find it faster than simply allowing macro expansions into the one item cache. This alone nets us back a lot of the performance loss.

That said, this is a modification of caching logic, which is playing with a double edged sword. While it significantly improves the pathological case, its hard to say that there's not an equal but opposite pathological case that isn't regressed by this change. Though non-pathological cases of builds of the Linux kernel before [0] are only slightly improved (<1%) and builds of LLVM itself don't change due to this patch.

Should future travelers find this change to significantly harm their build times, I encourage them to feel empowered to revert this change.

SourceManager::getFileIDLocal has a FIXME hinting that the call to SourceManager::isOffsetInFileID could be made much faster since isOffsetInFileID is generic in the sense that it tries to handle the more generic case of "local" (as opposed to "loaded") files, though the caller has already determined the file to be local. This patch implements a new method that specialized for use when the caller already knows the file is local, then use that in TokenLexer::updateLocForMacroArgTokens. This should be less controversial than 1, and is likely an across the board win. It's much less significant for the pathological case, but still a measurable win once we have fallen to the final case of binary search. D82497

A bunch of methods in SourceManager take a default argument. SourceManager::getLocalSLocEntry doesn't do anything with this argument, yet many callers of getLocalSLocEntry setup, pass, then check this argument. This is wasted work. D82498

With this patch applied, the above profile [5] for the same pathological
input looks like:

25.1% time spent in clang::TokenLexer::updateLocForMacroArgTokens
17.2% time spent in clang::SourceManager::getFileIDLocal

and clang::SourceManager::isOffsetInFileID is no longer called, and thus
falls out of the profile.

There may be further improvements to the general problem of "what
interval contains one number out of millions" than the current use of a
one item cache, followed by linear probing, followed by binary
searching. We might even be able to do something smarter in
TokenLexer::updateLocForMacroArgTokens.

[0] https://github.com/ClangBuiltLinux/linux/commit/cdd28ad2d8110099e43527e96d059c5639809680
[1] https://github.com/ClangBuiltLinux/linux/issues/1032
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=locking/kcsan&id=a5dead405f6be1fb80555bdcb77c406bf133fdc8
[3] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667
[4] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633741923
[5] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-634932736

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nickdesaulniers created this revision.May 27 2020, 5:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2020, 5:43 PM

Herald added subscribers: cfe-commits, jfb, kristof.beyls. · View Herald Transcript

Harbormaster failed remote builds in B58144: Diff 266711!May 27 2020, 6:02 PM

Bumping for review. Are there other trusted reviewers we could add the to review to help?

nickdesaulniers added reviewers: sammccall, kadircet.Jun 3 2020, 11:54 AM

Bumping for review.

I am not the best person to review this, but dig a little bit into your issues and history of this file. Your diagnosis and fix seems reasonable, AFAICT the existing caching behavior was chosen to optimize files with many macro invocations that are expanding into relatively few tokens. So this case will definitely regress after this patch, proportional to number_of_macro_name_tokens/number_of_total_tokens but I would expect that ratio to be quite small in any sane file hence regression to be negligible.

OTOH, when that ratio is low, if number of tokens coming from each/some macro expansions is huge (which IIUC is your case) updating that cache makes a lot of sense.

Also your claim on builds of LLVM itself don't change due to this patch. sounds promising. I've seen quite nice benchmarks for kernel build times in the issues you've mentioned it would be nice if you could back that claim with similar data(not that I don't take your word for it, I just love tables and I am jealous that kernel has some but we don't :( )

Apart from all of this, I believe this patch (even though has relatively few lines) contains 3 distinct patches that should be split:

Changing caching behaviour to apply to macro expansions
Adding a more optimized way of checking if an offset is inside a local file
Getting rid of some "dead" code.

I can say this patch LG from my side, but my understanding of this code isn't quite strong so I would wait for a couple more days to see if someone else will chime in.

clang/include/clang/Basic/SourceManager.h
1747 ↗	(On Diff #266711)	i think this deserves its separate patch. We still can satisfy the interface requirements of libclang by introducing another member `getLocalSLocEntryWithInvalid` and change the assertion into an if check, populating `Invalid` in case of failure, what to return is more tricky though it is a const ref, I suppose we can return a dummy entry. Feel free to keep the newly added calls without an `Invalid` argument, but I wouldn't modify the already existing ones in this patch.
clang/lib/Basic/SourceManager.cpp
903	we already call getLocalSLocEntry for `MiddleIndex` above, moreover we even early exit `if (SlocOffset < MidOffset)` which is the second case in your new function. So I would say we could even gain more by: const auto &SLocForMiddle = getLocalSLocEntry(MiddleIndex); auto MidOffset = SLocForMiddle.getOffset(); . . . if (MiddleIndex + 1 ==LocalSLocEntryTable.size() \|\| SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) { // we've got a hit! } That way you also wouldn't need to introduce a new function. Even though there's value in new helpers, IMO `SourceManager` is already complicated and has enough variants of each function to shoot yourself in the foot. If you want feel free to put a FIXME saying `we can extract a isOffsetInLocalFile helper from these lines if usage is more spread`. Again I would say this is worth doing in a separate patch. As it should be a clear win for all.
971	my comments above for `isOffsetnLocalFileID` applies to here as well, in case you decide to move them into a separate patch.

Also forgot to say, thanks a lot for taking your time to investigate this issue and coming up with a patch!

I've seen quite nice benchmarks for kernel build times in the issues you've mentioned it would be nice if you could back that claim with similar data(not that I don't take your word for it, I just love tables and I am jealous that kernel has some but we don't :( )

Heh, those were painstakingly written in markdown by hand, using CC="/usr/bin/time -v clang" to test. The weren't statistically significant (N=1), nor were they autogenerated. My claims about builds of LLVM are similarly non-scientific in terms of statistical significance.

I can say this patch LG from my side, but my understanding of this code isn't quite strong so I would wait for a couple more days to see if someone else will chime in.
Also forgot to say, thanks a lot for taking your time to investigate this issue and coming up with a patch!

That's ok, I think generally finding someone to take the time to review pieces of a large project is generally difficult, and that contributors should feel empowered to be able to sign off on modifications to code they themselves did not write. Thank you for taking the time to review it!

I will break this up into 3 patches, and ask you kindly to review once I have them ready. I will post them here, too, in case other reviewers or subscribers would like to follow along, then I'll Abandon this revision.

clang/include/clang/Basic/SourceManager.h
1747 ↗	(On Diff #266711)	the interface requirements of libclang Oh? I must have missed that. Can you tell me more about that interface? I would have thought this would fail to build if I messed up an interface?
clang/lib/Basic/SourceManager.cpp
903	Ah, right! This is a great suggestion! And it looks correct to me.

ychen added a subscriber: ychen.Jun 16 2020, 3:05 PM

In D80681#2096886, @nickdesaulniers wrote:

Heh, those were painstakingly written in markdown by hand, using CC="/usr/bin/time -v clang" to test. The weren't statistically significant (N=1), nor were they autogenerated. My claims about builds of LLVM are similarly non-scientific in terms of statistical significance.

Thanks, I believe it would be still good to provide those numbers for LLVM too (even if they are non-scientific), for the sake of future travellers that want to play with caching logic here.

I will break this up into 3 patches, and ask you kindly to review once I have them ready. I will post them here, too, in case other reviewers or subscribers would like to follow along, then I'll Abandon this revision.

thanks a lot, I've provided some details on your libclang interface comment below. but you might want to hold off on dead-code elimination patch, as it is in a quite fragile state and libclang stability is a concern I am not too familiar with.

clang/include/clang/Basic/SourceManager.h
1747 ↗	(On Diff #266711)	signature requirement is specified in https://github.com/llvm/llvm-project/blob/master/clang/tools/libclang/CIndexInclusionStack.cpp#L21, but that's actually an implementation file. So one could change the details instead and keep libclang API the same (or as mentioned above, you can just introduce a new SourceManager member that will fill in Invalid) But the more I looked at this code, it started looking more iffy. It's `getLoadedSLocEntry` sibling is operating on a vector, asserting in a couple places and as a fallback just creating a new entry at given index via `lodadedentriesvector[index] = something`without caring about OOB access as it asserted previously. So this whole `Invalid` shenanigan looks ... weird. Satisfying a const ref return type with an optional `Invalid` flag seems implausible to me :/

nickdesaulniers mentioned this in D82497: [Clang][SourceManager] optimize getFileIDLocal().Jun 24 2020, 1:09 PM

nickdesaulniers mentioned this in D82498: [SourceManager] don't check invalid param of getLocalSLocEntry().Jun 24 2020, 1:24 PM

I've forked off

D82497 (getFileIDLocal optimizations)
D82498 (don't check Invalid for calls to getLocalSLocEntry)

I agree those should be the less controversial parts of this diff. Once those are landed, I'll rebase this change to just contain the caching logic change.

nickdesaulniers added inline comments.Jun 24 2020, 1:36 PM

clang/lib/Basic/SourceManager.cpp
971	This case `getFileIDLoaded()` did not show up in any traces. It also doesn't have the same FIXME (though that doesn't really matter). As such, I'm more inclined to "let sleeping dogs lie."

nickdesaulniers mentioned this in rG408efffbe4a5: [Clang][SourceManager] optimize getFileIDLocal().Jun 25 2020, 10:15 AM

nickdesaulniers mentioned this in rG8cce7af090bd: [SourceManager] don't check invalid param of getLocalSLocEntry().Jun 26 2020, 10:23 AM

rebase on D82497 & D82498

Cool, should be good for review. This change is now nice an isolated, in case we need to revert.

LGTM!

Thanks for bearing with me on splitting the patches and such! Let's land it and see if this is going to regress any projects.

This revision is now accepted and ready to land.Jun 26 2020, 11:53 AM

Harbormaster failed remote builds in B61961: Diff 273780!Jun 26 2020, 12:02 PM

Closed by commit rGdffc1420451f: [clang][SourceManager] cache Macro Expansions (authored by nickdesaulniers). · Explain WhyJun 26 2020, 1:08 PM

This revision was automatically updated to reflect the committed changes.

I appreciate you taking the time to review, and find additional optimizations! It's not populated yet, but IIUC, this should be the link with more test results: http://llvm-compile-time-tracker.com/compare.php?from=7cc5307c73caa72f22edd4208b175e3c36eec46e&to=dffc1420451f674731cb36799c8ae084104ff0b5&stat=instructions

Again, if future travelers find that this change tickles a pathological case in their builds, please feel empowered to revert. We can always try to thread through the current API that we're expanding tokens of a GNU C Statement Expression, and thus should cache macro expansions. This is just less invasive and I didn't measure any slowdowns locally (only speedups for my pathological case).

nickdesaulniers added inline comments.Jun 26 2020, 3:17 PM

clang/lib/Basic/SourceManager.cpp
901	oh, this case here should have been removed, too.

nickdesaulniers mentioned this in D82690: [clang][SourceManager] cache Macro Expansions pt.2.Jun 26 2020, 3:28 PM

nickdesaulniers added inline comments.Jun 26 2020, 3:30 PM

clang/lib/Basic/SourceManager.cpp
901	https://reviews.llvm.org/D82690 Alternatively, I could revert dffc1420451f674, and reland with this hunk? That way if a revert is necessary in the future, it's only one. Also so that before/after metrics are together and not across two separate patches.

Reverted in 7c2cb1448ad2d20e251db5e3ae4a0c84c12aa970, relanded in 7b8cf98b4a9a2f5ea3667fdbf913a4f8952ed36a.

Revision Contents

Path

Size

clang/

lib/

Basic/

SourceManager.cpp

14 lines

Diff 273823

clang/lib/Basic/SourceManager.cpp

Show First 20 Lines • Show All 855 Lines • ▼ Show 20 Lines	FileID SourceManager::getFileIDLocal(unsigned SLocOffset) const {

// Find the FileID that contains this. "I" is an iterator that points to a		// Find the FileID that contains this. "I" is an iterator that points to a
// FileID whose offset is known to be larger than SLocOffset.		// FileID whose offset is known to be larger than SLocOffset.
unsigned NumProbes = 0;		unsigned NumProbes = 0;
while (true) {		while (true) {
--I;		--I;
if (I->getOffset() <= SLocOffset) {		if (I->getOffset() <= SLocOffset) {
FileID Res = FileID::get(int(I - LocalSLocEntryTable.begin()));		FileID Res = FileID::get(int(I - LocalSLocEntryTable.begin()));
		// Remember it. We have good locality across FileID lookups.
// If this isn't an expansion, remember it. We have good locality across
// FileID lookups.
if (!I->isExpansion())
LastFileIDLookup = Res;		LastFileIDLookup = Res;
NumLinearScans += NumProbes+1;		NumLinearScans += NumProbes+1;
return Res;		return Res;
}		}
if (++NumProbes == 8)		if (++NumProbes == 8)
break;		break;
}		}

// Convert "I" back into an index. We know that it is an entry whose index is		// Convert "I" back into an index. We know that it is an entry whose index is
Show All 19 Lines	while (true) {

// If the middle index contains the value, succeed and return.		// If the middle index contains the value, succeed and return.
if (MiddleIndex + 1 == LocalSLocEntryTable.size() \|\|		if (MiddleIndex + 1 == LocalSLocEntryTable.size() \|\|
SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) {		SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) {
FileID Res = FileID::get(MiddleIndex);		FileID Res = FileID::get(MiddleIndex);

// If this isn't a macro expansion, remember it. We have good locality		// If this isn't a macro expansion, remember it. We have good locality
// across FileID lookups.		// across FileID lookups.
if (!LocalSLocEntryTable[MiddleIndex].isExpansion())		if (!LocalSLocEntryTable[MiddleIndex].isExpansion())
		nickdesaulniersAuthorUnsubmitted Not Done Reply Inline Actions oh, this case here should have been removed, too. nickdesaulniers: oh, this case here should have been removed, too.
		nickdesaulniersAuthorUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/D82690 Alternatively, I could revert dffc1420451f674, and reland with this hunk? That way if a revert is necessary in the future, it's only one. Also so that before/after metrics are together and not across two separate patches. nickdesaulniers: https://reviews.llvm.org/D82690 Alternatively, I could revert dffc1420451f674, and reland with…
LastFileIDLookup = Res;		LastFileIDLookup = Res;
NumBinaryProbes += NumProbes;		NumBinaryProbes += NumProbes;
		kadircetUnsubmitted Done Reply Inline Actions we already call getLocalSLocEntry for `MiddleIndex` above, moreover we even early exit `if (SlocOffset < MidOffset)` which is the second case in your new function. So I would say we could even gain more by: const auto &SLocForMiddle = getLocalSLocEntry(MiddleIndex); auto MidOffset = SLocForMiddle.getOffset(); . . . if (MiddleIndex + 1 ==LocalSLocEntryTable.size() \|\| SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) { // we've got a hit! } That way you also wouldn't need to introduce a new function. Even though there's value in new helpers, IMO `SourceManager` is already complicated and has enough variants of each function to shoot yourself in the foot. If you want feel free to put a FIXME saying `we can extract a isOffsetInLocalFile helper from these lines if usage is more spread`. Again I would say this is worth doing in a separate patch. As it should be a clear win for all. kadircet: we already call getLocalSLocEntry for `MiddleIndex` above, moreover we even early exit `if…
		nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions Ah, right! This is a great suggestion! And it looks correct to me. nickdesaulniers: Ah, right! This is a great suggestion! And it looks correct to me.
return Res;		return Res;
}		}

// Otherwise, move the low-side up to the middle index.		// Otherwise, move the low-side up to the middle index.
LessIndex = MiddleIndex;		LessIndex = MiddleIndex;
}		}
}		}

Show All 20 Lines	else
I = (-LastID - 2) + 1;		I = (-LastID - 2) + 1;

unsigned NumProbes;		unsigned NumProbes;
for (NumProbes = 0; NumProbes < 8; ++NumProbes, ++I) {		for (NumProbes = 0; NumProbes < 8; ++NumProbes, ++I) {
// Make sure the entry is loaded!		// Make sure the entry is loaded!
const SrcMgr::SLocEntry &E = getLoadedSLocEntry(I);		const SrcMgr::SLocEntry &E = getLoadedSLocEntry(I);
if (E.getOffset() <= SLocOffset) {		if (E.getOffset() <= SLocOffset) {
FileID Res = FileID::get(-int(I) - 2);		FileID Res = FileID::get(-int(I) - 2);

if (!E.isExpansion())
LastFileIDLookup = Res;		LastFileIDLookup = Res;
NumLinearScans += NumProbes + 1;		NumLinearScans += NumProbes + 1;
return Res;		return Res;
}		}
}		}

// Linear scan failed. Do the binary search. Note the reverse sorting of the		// Linear scan failed. Do the binary search. Note the reverse sorting of the
// table: GreaterIndex is the one where the offset is greater, which is		// table: GreaterIndex is the one where the offset is greater, which is
// actually a lower index!		// actually a lower index!
Show All 14 Lines	if (E.getOffset() > SLocOffset) {
if (GreaterIndex == MiddleIndex) {		if (GreaterIndex == MiddleIndex) {
assert(0 && "binary search missed the entry");		assert(0 && "binary search missed the entry");
return FileID();		return FileID();
}		}
GreaterIndex = MiddleIndex;		GreaterIndex = MiddleIndex;
continue;		continue;
}		}

if (isOffsetInFileID(FileID::get(-int(MiddleIndex) - 2), SLocOffset)) {		if (isOffsetInFileID(FileID::get(-int(MiddleIndex) - 2), SLocOffset)) {
		kadircetUnsubmitted Done Reply Inline Actions my comments above for `isOffsetnLocalFileID` applies to here as well, in case you decide to move them into a separate patch. kadircet: my comments above for `isOffsetnLocalFileID` applies to here as well, in case you decide to…
		nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions This case `getFileIDLoaded()` did not show up in any traces. It also doesn't have the same FIXME (though that doesn't really matter). As such, I'm more inclined to "let sleeping dogs lie." nickdesaulniers: This case `getFileIDLoaded()` did not show up in any traces. It also doesn't have the same…
FileID Res = FileID::get(-int(MiddleIndex) - 2);		FileID Res = FileID::get(-int(MiddleIndex) - 2);
if (!E.isExpansion())
LastFileIDLookup = Res;		LastFileIDLookup = Res;
NumBinaryProbes += NumProbes;		NumBinaryProbes += NumProbes;
return Res;		return Res;
}		}

// Sanity checking, otherwise a bug may lead to hanging in release build.		// Sanity checking, otherwise a bug may lead to hanging in release build.
if (LessIndex == MiddleIndex) {		if (LessIndex == MiddleIndex) {
assert(0 && "binary search missed the entry");		assert(0 && "binary search missed the entry");
return FileID();		return FileID();
▲ Show 20 Lines • Show All 1,294 Lines • Show Last 20 Lines