This is an archive of the discontinued LLVM Phabricator instance.

[SourceManager] Speedup getFileIDLocal with a separate Offset Table.
AcceptedPublic

Authored by hokein on Oct 7 2022, 4:18 AM.

Details

Summary

This patch introduces a separate Offset array in SourceManager, it is redundant
and merely used for searching. This "AOS vs. SOA" optimization give us a large
memory locality win, see the data below.

This is a low-hanging fruit optimization (without significantly changing the
SlocEntry and its underlying storge in SourceManager). The sad bit is thatit
increases SourceManager memory usage, however it is a small fraction (< 10%) of
clang AST memory usage, which I think it is mostly negligible.

getFileIDLoaded is excluded in this patch, it is trickier due to the lazy load of
SLocEntry, unclear we will gain as much speedup as the local one.

Speedup:

Linux kernel: getFileIDLocal overhead is reduced to 1.66% (~30% saving)

Memory usage in bytes (AST built in clangd):

SemaExpr.cpp:  149502874 ->149529494 
FindTarget.cpp  83177157 ->  83177569

Diff Detail

Event Timeline

hokein created this revision.Oct 7 2022, 4:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 4:18 AM
Herald added a subscriber: kadircet. · View Herald Transcript
hokein requested review of this revision.Oct 7 2022, 4:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 4:18 AM
nickdesaulniers added inline comments.Oct 7 2022, 2:40 PM
clang/include/clang/Basic/SourceManager.h
696

Add to this comment that the size is expected to stay in sync with LocalSLocEntryTable.

697

LocalSLocEntryTable is a vec of SrcMgr::SLocEntry. SrcMgr::SLocEntry's Offset is:

static constexpr int OffsetBits = 8 * sizeof(SourceLocation::UIntTy) - 1;
  SourceLocation::UIntTy Offset : OffsetBits;

So this is a vector of 32b values vs the 24b values we need. Does packing this by declaring a struct improve locality further?

diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index 70ba85bbe948..518c6ed59120 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -692,9 +692,15 @@ class SourceManager : public RefCountedBase<SourceManager> {
   /// Positive FileIDs are indexes into this table. Entry 0 indicates an invalid
   /// expansion.
   SmallVector<SrcMgr::SLocEntry, 0> LocalSLocEntryTable;
+
+  struct SMOffset {
+    SourceLocation::UIntTy Offset : 24;
+    SMOffset (SourceLocation::UIntTy O): Offset(O){}
+    operator SourceLocation::UIntTy() const { return Offset; }
+  };
   /// An in-parallel SlocEntry offset table, merely used for speeding up the
   /// FileID lookup.
-  SmallVector<SourceLocation::UIntTy> LocalLocOffsetTable;
+  SmallVector<SMOffset> LocalLocOffsetTable;
 
   /// The table of SLocEntries that are loaded from other modules.
   ///
clang/lib/Basic/SourceManager.cpp
676

should we add some asserts that LocalSLocEntryTable.size() == LocalLocOffsetTable.size() somewhere?

nickdesaulniers added inline comments.Oct 7 2022, 2:50 PM
clang/include/clang/Basic/SourceManager.h
697

struct SMOffset { ... } __attribute__((packed));

sammccall added a subscriber: aaron.ballman.

This looks good to me, I think we should ask @aaron.ballman about the memory increase.

If we wanted to eliminate SLocEntry::Offset then we could add SourceManager::getOffset(FileID) based on this table and deprecate SLocEntry::getOffset().
We'd have to tackle loaded SLocs for that though.

SemaExpr.cpp: 149529494 -> 149502874

Are these numbers backwards? Why would memory usage go down?

clang/include/clang/Basic/SourceManager.h
697

So this is a vector of 32b values vs the 24b values we need

It's 31 bits: 8 * sizeof - 1, not 8 * (sizeof - 1).
(And none of those bits are spare in practice)

It'd be great to have some numbers for overall memory increase based on clang rather than clangd, that's a more critical case and has a different memory pattern.

hokein updated this revision to Diff 466595.Oct 10 2022, 1:03 PM

print memory usage in SourceManager::PrintStats.

hokein edited the summary of this revision. (Show Details)Oct 10 2022, 1:03 PM

SemaExpr.cpp: 149529494 -> 149502874

Are these numbers backwards? Why would memory usage go down?

Oops, you're right, I made a mistake, correct it in the description.

More data on clang memory usage (measured with the clang -cc1 -fsyntax-only -print-stats, AST size = ASTContext + Decl + Stmt/Expr; Total = AST size + SourceManager size)

The memory increase for C++ is negligible (< 1%), but for C, we have ~5% memory increase.

AST sizeSourceManager sizeTotal
SemaExpr.cpp275.1 MB15MB -> 16.6 MB290.1MB-> 291.7MB
FindTarget.cpp131.9MB14.3MB -> 15.9MB146.2MB -> 147.8 MB
drm_property.c11.4MB3.6MB -> 4.4 MB15MB -> 15.8MB
nl80211.c28.4MB26.4MB -> 29.6MB54.8MB -> 58MB
nickdesaulniers accepted this revision.Oct 10 2022, 1:11 PM

Is it worth it and possible to fully decompose LocalSLocEntryTable into arrays of its constituent parts, and only construct a SLocEntry when necessary?

I'm ok with the cost increase; please also adding asserts to keep the size of the vectors equivalent.

This revision is now accepted and ready to land.Oct 10 2022, 1:11 PM
hokein updated this revision to Diff 466729.Oct 11 2022, 1:36 AM
hokein marked an inline comment as done.

add assert

Is it worth it and possible to fully decompose LocalSLocEntryTable into arrays of its constituent parts, and only construct a SLocEntry when necessary?

Fully decomposing LocalSLocEntryTable into arrays seems a good idea (which will get us more memory back, since we don't need to pay the 4 padding bytes for SLocEntry).
But I think constructing a SLocEntry every time we call getSLocEntry would be expensive since this is a hot function. Instead, we should probably deprecate the SLocEntry structure at all -- adding getOffset(FileID), getFileInfo(FileID), getExpansionInfo(FileID) to the SourceManager, this would take some more work (update all SLocEntry usage, tackle loadedSLocEntry` etc).

I'm ok with the cost increase; please also adding asserts to keep the size of the vectors equivalent.

added. I'd wait for a few days before landing it for @aaron.ballman's opinion.

Speedup: Linux kernel: getFileIDLocal overhead is reduced to 1.66% (~30% saving)

What's the fraction related to the whole parsing time?

The sad bit is thatit increases SourceManager memory usage, however it is a small fraction (< 10%) of clang AST memory usage, which I think it is mostly negligible.

What's the fraction related to the whole memory usage?

Have you tried decreasing the number of linear probing (8) or switching to branch-less binary search?