This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/COFF/
-
COFF/
2/4
PDB.cpp
-
llvm/
-
include/llvm/DebugInfo/PDB/Native/
-
llvm/
-
DebugInfo/
-
PDB/
-
Native/
2/2
GSIStreamBuilder.h
-
lib/DebugInfo/PDB/Native/
-
DebugInfo/
-
PDB/
-
Native/
9/15
GSIStreamBuilder.cpp

Differential D79467

[PDB] Optimize public symbol processing
ClosedPublic

Authored by rnk on May 5 2020, 7:48 PM.

Download Raw Diff

Details

Reviewers

aganea
MaskRay
hans

Commits

rG3b3e28a07cf5: [PDB] Optimize public symbol processing

Summary

Reduces time to link PGO instrumented net_unittets.exe by 11% (9.766s ->
8.672s, best of three). Reduces peak memory by 65.7MB (2142.71MB ->
2076.95MB).

Use a more compact struct, BulkPublic, for faster sorting. Sort in
parallel. Construct the hash buckets in parallel. Try to use one vector
to hold all the publics instead of copying them from one to another.
Allocate all the memory needed to serialize publics up front, and then
serialize them in place in parallel.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rnk created this revision.May 5 2020, 7:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2020, 7:48 PM

Herald added subscribers: mgrang, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B55883: Diff 262285!May 5 2020, 9:01 PM

This looks good as far as I can tell. Is there good test coverage here, e.g. if there's an off-by-one somewhere, would it be caught?

lld/COFF/PDB.cpp
1715	Before it ran at the end of addObjectsToPDB(). Does the order matter?

In D79467#2022660, @hans wrote:

This looks good as far as I can tell. Is there good test coverage here, e.g. if there's an off-by-one somewhere, would it be caught?

I will say that the existing tests caught many issues, and that gives me some confidence in this. All the hash bucket table stuff is pretty well tested.

lld/COFF/PDB.cpp
1715	My aim was to improve cache locality, but I'm not sure it matters. Previously we would add and sort the publics, then add the section contributions (slow, flushes cache), then commit the publics to disk. I figured this order would be better. In isolation, I don't think I was able to measure a difference above the noise though. In any case, the publics aren't really part of adding objects.
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
123	Speaking of undertested things, these might need to be LF_PAD bytes. I will double check.

Can you diff the full output of microsoft-pdb\cvdump\cvdump.exe your_exe.pdb and llvm-pdbutil dump -all your_exe.pdb before and after this patch? (with any large exe). This can be a bit tedious because the text dump is very large, but you can at least validate that things are still right (you're probably already doing this).

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
213	Maybe provide a link to the reference implemention repo? (ie.https://github.com/microsoft/microsoft-pdb)

In D79467#2022848, @aganea wrote:

Can you diff the full output of microsoft-pdb\cvdump\cvdump.exe your_exe.pdb and llvm-pdbutil dump -all your_exe.pdb before and after this patch? (with any large exe). This can be a bit tedious because the text dump is very large, but you can at least validate that things are still right (you're probably already doing this).

There is a diff in the globals stream due to different global hash bucket sorting. An example hunk looks like this:

$ diff -u lld*-pubs.txt | head -40
--- lld1-pubs.txt       2020-05-06 17:25:41.132778100 -0700
+++ lld2-pubs.txt       2020-05-06 17:25:37.352925100 -0700
@@ -197,11 +197,11 @@
            original type = 0x111F9D
   22547392 | S_UDT [size = 12] `HDC`
            original type = 0xC5A17
-  23582124 | S_GDATA32 [size = 28] `__newclmap`
+  23513760 | S_GDATA32 [size = 28] `__newclmap`
            type = 0x11445B (), addr = 0002:3648064
   23517196 | S_GDATA32 [size = 28] `__newclmap`
            type = 0x114534 (), addr = 0002:3648064
-  23513760 | S_GDATA32 [size = 28] `__newclmap`
+  23582124 | S_GDATA32 [size = 28] `__newclmap`
            type = 0x11445B (), addr = 0002:3648064
   11925196 | S_CONSTANT [size = 24] `SHRD32rri8`
            type = 0x24E8D (llvm::X86::<unnamed-tag>), value = 2704
@@ -1595,10 +1595,10 @@

The original code uses llvm::sort with gsiRecordLess, which is unstable, although deterministic. I modified the code to sort by symoffset to stabilize the output, but the new stable order doesn't match the old unstable order.

Other than that, all the streams and PDBs have the same size.

I checked the padding, and I believe the old publics are padded with zero bytes:
https://github.com/llvm/llvm-project/blob/01fc85dc9618394868b795c5087d9da03df9c58b/llvm/lib/DebugInfo/CodeView/SymbolRecordMapping.cpp#L42

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
213	I put one in the file header comment like the other files in here.

stabilize sort by symoffset, comments

Harbormaster failed remote builds in B55998: Diff 262510!May 6 2020, 6:46 PM

Seems good to me, just a few things:

lld/COFF/PDB.cpp
1346	Maybe reserve in advance to avoid reallocations? unsigned count{}; symTab->forEachSymbol([](Symbol s) { auto def = dyn_cast<Defined>(s); count += (def && def->isLive() && def->getChunk()); }); publics.reserve(count); Do you think there would be a gain to do the creation in parallel? (and do `.resize` instead in that case)
llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h
60	I know the call will be optimized anyway, but shouldn't we clarify the intention of transferring the ownership? `std::vector<BulkPublic> &&Publics` (just to prevent accidents in the future, if someone removes `std::move` at the caller site during a refactoring)
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
72	Same thing here: `std::vector<BulkPublic> &&`
196–227	finalizeBuckets(RecordZeroOffset, std::move(Globals))
200	std::vector<BulkPublic> &&Globals
203	Nice! I am wondering if we shouldn't do `union { uint16_t Segment; uint16_t BucketIdx; };` to make things a bit more explicit. Up to you!
215	...and in that case, do we need these two assignements? if (L.BucketIdx != R.BucketIdx) return L.BucketIdx < R.BucketIdx;
340	// parallelSort is unstable, so we have to do name
349	Do you think we can avoid `.push_back` altogether, and simplify to: PubAddrMap.resize(Publics.size()); ulittle32_t Next = &PubAddrMap.front(); for (const BulkPublic &Pub : Publics) Next++ = Pub.SymOffset;

address comments

lld/COFF/PDB.cpp
1346	This loop was relatively short compared to what comes next. I think that is because it doesn't touch the name strings. I think we'd lose more from checking all symbols twice than we spend copying and growing the vector. There's more to be gained from parallelizing section contribution creation or the main object file processing, so I plan to look at that instead.
llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h
60	Sure, that's a good idea.
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
203	I'll do it. I went the other way while working on the code because the struct is declared in a header, so it affects the PDB.cpp code. We also can't use a nameless union field because I believe that is an extension. The zero initialization also becomes complicated.
349	I think this is better because it avoids the zero-initialization of resize. This loop doesn't show up in the profile, FWIW.

Looks good, thanks!

This revision is now accepted and ready to land.May 8 2020, 7:53 AM

Harbormaster failed remote builds in B56160: Diff 262871!May 8 2020, 8:33 AM

This is the (poorly annotated in paint) profile that I'm looking at, BTW:

The big remaining costs are input reading and the PDB processing of each object file. I think both operations really depend on having a good, generic, parallel primitive for building a map.

To build up the symbol table, we maintain a map from symbol name to symbol definition. To process debug info, we need to build a map from type record (ghash, really) to dst type index, and then symbols can be processed in parallel. If you have any pending patches in this area, please remind me, I've forgotten the state of things.

Unfortunately, there isn't an obviously best algorithm for building a map in parallel. One way would be to do what I did for publics: put all external symbols in a vector, sort it in parallel, remove the duplicates, but this is more work overall in the single-threaded case (O(n log n), presumably with a high constant factor to move large objects around) than our simple sequential hash table insertion.

We could do a map-reduce style hash & bucket computation: hash keys in parallel, divide key space equally among workers, each worker builds a map, merge each map. We could *try* to share the table since we could arrange the workers to insert into distinct buckets, but this requires careful handling of re-hashing or estimating the table size up front.

This seems like it would be a well-studied problem, but my initial searches haven't gotten me anywhere that I like yet. Apologies if I'm retreading something you've already researched, I vaguely recall a super-parallel ghash implementation.

Closed by commit rG3b3e28a07cf5: [PDB] Optimize public symbol processing (authored by rnk). · Explain WhyMay 8 2020, 10:43 AM

This revision was automatically updated to reflect the committed changes.

Yes, I wanted to get back to that GHash parallelization at some point, but I'm swamped in deploying Clang on production and shipping some of our games. I was planning to get back to that eventually: D55585 -- However the plan was to first move all Type-related code from PDB.cpp to DebugTypes.cpp, to ease things a bit: D59226 -- this still needs to be completed (steps 5-7).

D55585 was only parallelizing the hashing, not the type merging. It would be possible however to do hashing in parallel, without dividing the keyspace, however I'm not sure yet what is the best strategy. Internally at Ubisoft we have a lock-free hashmap which is being used for the past 12 years, it's very stable and well tested. We will be happy to open source it or reimplement it in LLVM. However it is lock-free, not wait-free, so I wanted to try it first (on the type merging).

The other strategy I was pursuing was a lock-free and wait-free hashmap. But that requires atomic operations, with fixed 64-bit (or 128-bit) buckets (key+value). A 2x 64-bit is also doable I suppose, if the target architecture doesn't have 128-bit atomics. The big problem then is resizing the hashmap, but there again, there could be a lock-free solution, by re-hashing in a thread while the other threads are still inserting in the old hashmap. It's tricky, but there's prior art in this domain.
I think this solution would scale better that our current lock-free hashmap, which requires spinlocks when inserting nodes. That can potentially give back time slices to the kernel in the form of Sleep() or SuspendThread(), and that is a bad thing IMHO. I don't see that scaling well past hunderd-core mark; only a atomic hashmap could work in my sense (if we want to plan ahead for the future decade).

Another subject is how to scale this kind of algorithm across NUMA nodes. Any operation that crosses the CPU socket or NUMA boundary is very expensive. This maybe requires a latent strategy, where operation could be synchonized in bluk, not independelty, across NUMA boundaries. Again, this is a hot topic in my sense, and I don't know if there's active research there (aside of decades-long MPI knowledge in the super computing world, which could apply maybe). If we build today a parallel Type merging which won't scale well in two years on the future EPYC, that would be a pity. I don't know, maybe it's worth doing it until it doesn't scale anymore?

If you feel like modifying or landing any of the patches above, feel free! If not, I'll eventually get back to them.

Right, I remember actually reviewing D59226, and then thinking that it had landed. I will look into rebasing it, thanks for the reminder.

@akhuang, I was thinking of the TpiSource class in that patch when I was saying that there should be an easy way to merge in types from dependent DLLs.

Speaking of which, @aganea, since you are using clang to compile now, you should try adding -Xclang -debug-info-kind=constructor if you haven't already. It greatly reduces the amount of duplicate type info that clang emits.

Regarding the lock free hash table, I guess it's the only solution in the long run. I think there is still low hanging fruit before we get there. For me, the bottleneck is no longer type merging, it's symbol merging, which can be trivially parallelized. So, if we do things like:

sequentially for all obj, run mergeDebugT
parallel for all obj, merge .debug$S using type index maps

In D79467#2027736, @rnk wrote:

Speaking of which, @aganea, since you are using clang to compile now, you should try adding -Xclang -debug-info-kind=constructor if you haven't already. It greatly reduces the amount of duplicate type info that clang emits.

We were discussing about that today. Is there any drawback for doing so?

In D79467#2027765, @aganea wrote:

In D79467#2027736, @rnk wrote:

Speaking of which, @aganea, since you are using clang to compile now, you should try adding -Xclang -debug-info-kind=constructor if you haven't already. It greatly reduces the amount of duplicate type info that clang emits.

We were discussing about that today. Is there any drawback for doing so?

There could be missing type info. If you use prebuilt third party libraries that were built without debug info, then it is likely that those types will be missing in the final PDB, even if all the type info is exposed in public headers that you compile.

However, this was already an issue with clang's default type info mode. The new flag moves the heuristic from the vtable to constructors, so it could make the situation worse.

You can work around this kind of issue by picking one object to compile with -fstandalone-debug, and then put static_assert(sizeof(DesiredType) >= 0, "require complete type"); in it somewhere. It might be nice to have a better way to say that without adjusting compiler flags, though.

abhinavgaba added a subscriber: abhinavgaba.May 10 2020, 9:56 PM

abhinavgaba added inline comments.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp

This is causing the following warnings:

ignoring packed attribute because of unpacked non-POD field ‘llvm::codeview::RecordPrefix {anonymous}::PublicSym32Layout::Prefix’
   RecordPrefix Prefix;
                ^
...
ignoring packed attribute because of unpacked non-POD field ‘llvm::codeview::PublicSym32Header {anonymous}::PublicSym32Layout::Pub’ 
   PublicSym32Header Pub;
                     ^

uabelho added a subscriber: uabelho.May 13 2020, 2:36 AM

uabelho added inline comments.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
93	We see this too, we're using gcc 7.4.0 when compiling clang. We did some small experiments and the struct does get different sizes when compiled with clang vs gcc which might be a problem? With LLVM_PACKED_START/LLVM_PACKED_END around PublicSym32Layout instead of LLVM_PACKED the struct seems to get packed both with clang and gcc so perhaps that's a solution?

uabelho added inline comments.May 13 2020, 3:01 AM

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
93	Btw seems like someone already noticed that clang and gcc sometimes behaves differently regarding the packed attribute: https://bugs.llvm.org/show_bug.cgi?id=28571#c3

Hopefully rG409274274 addresses the GCC warnings. I didn't get a chance to locally repro and confirm that the fix works.

In D79467#2035176, @rnk wrote:

Hopefully rG409274274 addresses the GCC warnings. I didn't get a chance to locally repro and confirm that the fix works.

It's silent for me now at least. Thanks!

Revision Contents

Path

Size

lld/

COFF/

PDB.cpp

41 lines

llvm/

include/

llvm/

DebugInfo/

PDB/

Native/

GSIStreamBuilder.h

37 lines

lib/

DebugInfo/

PDB/

Native/

GSIStreamBuilder.cpp

303 lines

Diff 262897

lld/COFF/PDB.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
#include "llvm/Object/COFF.h"		#include "llvm/Object/COFF.h"
#include "llvm/Object/CVDebugRecord.h"		#include "llvm/Object/CVDebugRecord.h"
#include "llvm/Support/BinaryByteStream.h"		#include "llvm/Support/BinaryByteStream.h"
#include "llvm/Support/CRC.h"		#include "llvm/Support/CRC.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/Errc.h"		#include "llvm/Support/Errc.h"
#include "llvm/Support/FormatAdapters.h"		#include "llvm/Support/FormatAdapters.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/Parallel.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/ScopedPrinter.h"		#include "llvm/Support/ScopedPrinter.h"
#include <memory>		#include <memory>

using namespace llvm;		using namespace llvm;
using namespace llvm::codeview;		using namespace llvm::codeview;
using namespace lld;		using namespace lld;
using namespace lld::coff;		using namespace lld::coff;

using llvm::object::coff_section;		using llvm::object::coff_section;

static ExitOnError exitOnErr;		static ExitOnError exitOnErr;

static Timer totalPdbLinkTimer("PDB Emission (Cumulative)", Timer::root());		static Timer totalPdbLinkTimer("PDB Emission (Cumulative)", Timer::root());

static Timer addObjectsTimer("Add Objects", totalPdbLinkTimer);		static Timer addObjectsTimer("Add Objects", totalPdbLinkTimer);
static Timer typeMergingTimer("Type Merging", addObjectsTimer);		static Timer typeMergingTimer("Type Merging", addObjectsTimer);
static Timer symbolMergingTimer("Symbol Merging", addObjectsTimer);		static Timer symbolMergingTimer("Symbol Merging", addObjectsTimer);
static Timer globalsLayoutTimer("Globals Stream Layout", totalPdbLinkTimer);		static Timer publicsLayoutTimer("Publics Stream Layout", totalPdbLinkTimer);
static Timer tpiStreamLayoutTimer("TPI Stream Layout", totalPdbLinkTimer);		static Timer tpiStreamLayoutTimer("TPI Stream Layout", totalPdbLinkTimer);
static Timer diskCommitTimer("Commit to Disk", totalPdbLinkTimer);		static Timer diskCommitTimer("Commit to Disk", totalPdbLinkTimer);

namespace {		namespace {
class DebugSHandler;		class DebugSHandler;

class PDBLinker {		class PDBLinker {
friend DebugSHandler;		friend DebugSHandler;
Show All 14 Lines	public:
void addNatvisFiles();		void addNatvisFiles();

/// Add named streams specified on the command line.		/// Add named streams specified on the command line.
void addNamedStreams();		void addNamedStreams();

/// Link CodeView from each object file in the symbol table into the PDB.		/// Link CodeView from each object file in the symbol table into the PDB.
void addObjectsToPDB();		void addObjectsToPDB();

		/// Add every live, defined public symbol to the PDB.
		void addPublicsToPDB();

/// Link info for each import file in the symbol table into the PDB.		/// Link info for each import file in the symbol table into the PDB.
void addImportFilesToPDB(ArrayRef<OutputSection *> outputSections);		void addImportFilesToPDB(ArrayRef<OutputSection *> outputSections);

/// Link CodeView from a single object file into the target (output) PDB.		/// Link CodeView from a single object file into the target (output) PDB.
/// When a precompiled headers object is linked, its TPI map might be provided		/// When a precompiled headers object is linked, its TPI map might be provided
/// externally.		/// externally.
void addObjFile(ObjFile file, CVIndexMap externIndexMap = nullptr);		void addObjFile(ObjFile file, CVIndexMap externIndexMap = nullptr);

▲ Show 20 Lines • Show All 1,173 Lines • ▼ Show 20 Lines	for (Chunk *c : chunks) {
continue;		continue;
pdb::SectionContrib sc = createSectionContrib(secChunk, modi);		pdb::SectionContrib sc = createSectionContrib(secChunk, modi);
file->moduleDBI->setFirstSectionContrib(sc);		file->moduleDBI->setFirstSectionContrib(sc);
break;		break;
}		}
}		}
}		}

static PublicSym32 createPublic(Defined *def) {		static pdb::BulkPublic createPublic(Defined *def) {
PublicSym32 pub(SymbolKind::S_PUB32);		pdb::BulkPublic pub;
pub.Name = def->getName();		pub.Name = def->getName().data();
		pub.NameLen = def->getName().size();

		PublicSymFlags flags = PublicSymFlags::None;
if (auto *d = dyn_cast<DefinedCOFF>(def)) {		if (auto *d = dyn_cast<DefinedCOFF>(def)) {
if (d->getCOFFSymbol().isFunctionDefinition())		if (d->getCOFFSymbol().isFunctionDefinition())
pub.Flags = PublicSymFlags::Function;		flags = PublicSymFlags::Function;
} else if (isa<DefinedImportThunk>(def)) {		} else if (isa<DefinedImportThunk>(def)) {
pub.Flags = PublicSymFlags::Function;		flags = PublicSymFlags::Function;
}		}
		pub.Flags = static_cast<uint16_t>(flags);

OutputSection *os = def->getChunk()->getOutputSection();		OutputSection *os = def->getChunk()->getOutputSection();
assert(os && "all publics should be in final image");		assert(os && "all publics should be in final image");
pub.Offset = def->getRVA() - os->getRVA();		pub.Offset = def->getRVA() - os->getRVA();
pub.Segment = os->sectionIndex;		pub.U.Segment = os->sectionIndex;
return pub;		return pub;
}		}

// Add all object files to the PDB. Merge .debug$T sections into IpiData and		// Add all object files to the PDB. Merge .debug$T sections into IpiData and
// TpiData.		// TpiData.
void PDBLinker::addObjectsToPDB() {		void PDBLinker::addObjectsToPDB() {
ScopedTimer t1(addObjectsTimer);		ScopedTimer t1(addObjectsTimer);

createModuleDBI(builder);		createModuleDBI(builder);

for (ObjFile *file : ObjFile::instances)		for (ObjFile *file : ObjFile::instances)
addObjFile(file);		addObjFile(file);

builder.getStringTableBuilder().setStrings(pdbStrTab);		builder.getStringTableBuilder().setStrings(pdbStrTab);
t1.stop();		t1.stop();

// Construct TPI and IPI stream contents.		// Construct TPI and IPI stream contents.
ScopedTimer t2(tpiStreamLayoutTimer);		ScopedTimer t2(tpiStreamLayoutTimer);
addTypeInfo(builder.getTpiBuilder(), tMerger.getTypeTable());		addTypeInfo(builder.getTpiBuilder(), tMerger.getTypeTable());
addTypeInfo(builder.getIpiBuilder(), tMerger.getIDTable());		addTypeInfo(builder.getIpiBuilder(), tMerger.getIDTable());
t2.stop();		t2.stop();
		}

ScopedTimer t3(globalsLayoutTimer);		void PDBLinker::addPublicsToPDB() {
// Compute the public and global symbols.		ScopedTimer t3(publicsLayoutTimer);
		// Compute the public symbols.
auto &gsiBuilder = builder.getGsiBuilder();		auto &gsiBuilder = builder.getGsiBuilder();
std::vector<PublicSym32> publics;		std::vector<pdb::BulkPublic> publics;
symtab->forEachSymbol([&publics](Symbol *s) {		symtab->forEachSymbol([&publics](Symbol *s) {
		aganeaUnsubmitted Not Done Reply Inline Actions Maybe reserve in advance to avoid reallocations? unsigned count{}; symTab->forEachSymbol([](Symbol s) { auto def = dyn_cast<Defined>(s); count += (def && def->isLive() && def->getChunk()); }); publics.reserve(count); Do you think there would be a gain to do the creation in parallel? (and do `.resize` instead in that case) aganea: Maybe reserve in advance to avoid reallocations? ``` unsigned count{}; symTab->forEachSymbol([]…
		rnkAuthorUnsubmitted Done Reply Inline Actions This loop was relatively short compared to what comes next. I think that is because it doesn't touch the name strings. I think we'd lose more from checking all symbols twice than we spend copying and growing the vector. There's more to be gained from parallelizing section contribution creation or the main object file processing, so I plan to look at that instead. rnk: This loop was relatively short compared to what comes next. I think that is because it doesn't…
// Only emit defined, live symbols that have a chunk.		// Only emit external, defined, live symbols that have a chunk. Static,
		// non-external symbols do not appear in the symbol table.
auto *def = dyn_cast<Defined>(s);		auto *def = dyn_cast<Defined>(s);
if (def && def->isLive() && def->getChunk())		if (def && def->isLive() && def->getChunk())
publics.push_back(createPublic(def));		publics.push_back(createPublic(def));
});		});

if (!publics.empty()) {		if (!publics.empty()) {
publicSymbols = publics.size();		publicSymbols = publics.size();
// Sort the public symbols and add them to the stream.		gsiBuilder.addPublicSymbols(std::move(publics));
parallelSort(publics, [](const PublicSym32 &l, const PublicSym32 &r) {
return l.Name < r.Name;
});
for (const PublicSym32 &pub : publics)
gsiBuilder.addPublicSymbol(pub);
}		}
}		}

void PDBLinker::printStats() {		void PDBLinker::printStats() {
if (!config->showSummary)		if (!config->showSummary)
return;		return;

SmallString<256> buffer;		SmallString<256> buffer;
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	void lld::coff::createPDB(SymbolTable *symtab,
PDBLinker pdb(symtab);		PDBLinker pdb(symtab);

pdb.initialize(buildId);		pdb.initialize(buildId);
pdb.addObjectsToPDB();		pdb.addObjectsToPDB();
pdb.addImportFilesToPDB(outputSections);		pdb.addImportFilesToPDB(outputSections);
pdb.addSections(outputSections, sectionTable);		pdb.addSections(outputSections, sectionTable);
pdb.addNatvisFiles();		pdb.addNatvisFiles();
pdb.addNamedStreams();		pdb.addNamedStreams();
		pdb.addPublicsToPDB();
		hansUnsubmitted Not Done Reply Inline Actions Before it ran at the end of addObjectsToPDB(). Does the order matter? hans: Before it ran at the end of addObjectsToPDB(). Does the order matter?
		rnkAuthorUnsubmitted Done Reply Inline Actions My aim was to improve cache locality, but I'm not sure it matters. Previously we would add and sort the publics, then add the section contributions (slow, flushes cache), then commit the publics to disk. I figured this order would be better. In isolation, I don't think I was able to measure a difference above the noise though. In any case, the publics aren't really part of adding objects. rnk: My aim was to improve cache locality, but I'm not sure it matters. Previously we would add and…

ScopedTimer t2(diskCommitTimer);		ScopedTimer t2(diskCommitTimer);
codeview::GUID guid;		codeview::GUID guid;
pdb.commit(&guid);		pdb.commit(&guid);
memcpy(&buildId->PDB70.Signature, &guid, 16);		memcpy(&buildId->PDB70.Signature, &guid, 16);

t2.stop();		t2.stop();
t1.stop();		t1.stop();
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h

	Show All 31 Lines
	};			};

	namespace msf {			namespace msf {
	class MSFBuilder;			class MSFBuilder;
	struct MSFLayout;			struct MSFLayout;
	} // namespace msf			} // namespace msf
	namespace pdb {			namespace pdb {
	struct GSIHashStreamBuilder;			struct GSIHashStreamBuilder;
				struct BulkPublic;

	class GSIStreamBuilder {			class GSIStreamBuilder {

	public:			public:
	explicit GSIStreamBuilder(msf::MSFBuilder &Msf);			explicit GSIStreamBuilder(msf::MSFBuilder &Msf);
	~GSIStreamBuilder();			~GSIStreamBuilder();

	GSIStreamBuilder(const GSIStreamBuilder &) = delete;			GSIStreamBuilder(const GSIStreamBuilder &) = delete;
	GSIStreamBuilder &operator=(const GSIStreamBuilder &) = delete;			GSIStreamBuilder &operator=(const GSIStreamBuilder &) = delete;

	Error finalizeMsfLayout();			Error finalizeMsfLayout();

	Error commit(const msf::MSFLayout &Layout, WritableBinaryStreamRef Buffer);			Error commit(const msf::MSFLayout &Layout, WritableBinaryStreamRef Buffer);

	uint32_t getPublicsStreamIndex() const { return PublicsStreamIndex; }			uint32_t getPublicsStreamIndex() const { return PublicsStreamIndex; }
	uint32_t getGlobalsStreamIndex() const { return GlobalsStreamIndex; }			uint32_t getGlobalsStreamIndex() const { return GlobalsStreamIndex; }
	uint32_t getRecordStreamIndex() const { return RecordStreamIndex; }			uint32_t getRecordStreamIndex() const { return RecordStreamIndex; }

	void addPublicSymbol(const codeview::PublicSym32 &Pub);			// Add public symbols in bulk.
				void addPublicSymbols(std::vector<BulkPublic> &&Publics);
				aganeaUnsubmitted Done Reply Inline Actions I know the call will be optimized anyway, but shouldn't we clarify the intention of transferring the ownership? `std::vector<BulkPublic> &&Publics` (just to prevent accidents in the future, if someone removes `std::move` at the caller site during a refactoring) aganea: I know the call will be optimized anyway, but shouldn't we clarify the intention of…
				rnkAuthorUnsubmitted Done Reply Inline Actions Sure, that's a good idea. rnk: Sure, that's a good idea.

	void addGlobalSymbol(const codeview::ProcRefSym &Sym);			void addGlobalSymbol(const codeview::ProcRefSym &Sym);
	void addGlobalSymbol(const codeview::DataSym &Sym);			void addGlobalSymbol(const codeview::DataSym &Sym);
	void addGlobalSymbol(const codeview::ConstantSym &Sym);			void addGlobalSymbol(const codeview::ConstantSym &Sym);
	void addGlobalSymbol(const codeview::CVSymbol &Sym);			void addGlobalSymbol(const codeview::CVSymbol &Sym);

	private:			private:
	uint32_t calculatePublicsHashStreamSize() const;			uint32_t calculatePublicsHashStreamSize() const;
	uint32_t calculateGlobalsHashStreamSize() const;			uint32_t calculateGlobalsHashStreamSize() const;
	Error commitSymbolRecordStream(WritableBinaryStreamRef Stream);			Error commitSymbolRecordStream(WritableBinaryStreamRef Stream);
	Error commitPublicsHashStream(WritableBinaryStreamRef Stream);			Error commitPublicsHashStream(WritableBinaryStreamRef Stream);
	Error commitGlobalsHashStream(WritableBinaryStreamRef Stream);			Error commitGlobalsHashStream(WritableBinaryStreamRef Stream);

	uint32_t PublicsStreamIndex = kInvalidStreamIndex;			uint32_t PublicsStreamIndex = kInvalidStreamIndex;
	uint32_t GlobalsStreamIndex = kInvalidStreamIndex;			uint32_t GlobalsStreamIndex = kInvalidStreamIndex;
	uint32_t RecordStreamIndex = kInvalidStreamIndex;			uint32_t RecordStreamIndex = kInvalidStreamIndex;
	msf::MSFBuilder &Msf;			msf::MSFBuilder &Msf;
	std::unique_ptr<GSIHashStreamBuilder> PSH;			std::unique_ptr<GSIHashStreamBuilder> PSH;
	std::unique_ptr<GSIHashStreamBuilder> GSH;			std::unique_ptr<GSIHashStreamBuilder> GSH;
				std::vector<support::ulittle32_t> PubAddrMap;
	};			};

				/// This struct is equivalent to codeview::PublicSym32, but it has been
				/// optimized for size to speed up bulk serialization and sorting operations
				/// during PDB writing.
				struct BulkPublic {
				const char *Name = nullptr;
				uint32_t NameLen = 0;

				// Offset of the symbol record in the publics stream.
				uint32_t SymOffset = 0;

				// Section offset of the symbol in the image.
				uint32_t Offset = 0;

				union {
				// Section index of the section containing the symbol.
				uint16_t Segment;

				// GSI hash table bucket index.
				uint16_t BucketIdx;
				} U{0};

				// PublicSymFlags or hash bucket index
				uint16_t Flags = 0;

				StringRef getName() const { return StringRef(Name, NameLen); }
				};

				static_assert(sizeof(BulkPublic) <= 24, "unexpected size increase");
				static_assert(std::is_trivially_copyable<BulkPublic>::value,
				"should be trivial");

	} // namespace pdb			} // namespace pdb
	} // namespace llvm			} // namespace llvm

	#endif			#endif

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp

//===- DbiStreamBuilder.cpp - PDB Dbi Stream Creation ------------ C++ --===//		//===- DbiStreamBuilder.cpp - PDB Dbi Stream Creation ------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		//
		// The data structures defined in this file are based on the reference
		// implementation which is available at
		// https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/dbi/gsi.cpp
		//
		//===----------------------------------------------------------------------===//

#include "llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h"		#include "llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h"

#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/DebugInfo/CodeView/RecordName.h"		#include "llvm/DebugInfo/CodeView/RecordName.h"
#include "llvm/DebugInfo/CodeView/SymbolDeserializer.h"		#include "llvm/DebugInfo/CodeView/SymbolDeserializer.h"
#include "llvm/DebugInfo/CodeView/SymbolRecord.h"		#include "llvm/DebugInfo/CodeView/SymbolRecord.h"
#include "llvm/DebugInfo/CodeView/SymbolSerializer.h"		#include "llvm/DebugInfo/CodeView/SymbolSerializer.h"
#include "llvm/DebugInfo/MSF/MSFBuilder.h"		#include "llvm/DebugInfo/MSF/MSFBuilder.h"
#include "llvm/DebugInfo/MSF/MSFCommon.h"		#include "llvm/DebugInfo/MSF/MSFCommon.h"
#include "llvm/DebugInfo/MSF/MappedBlockStream.h"		#include "llvm/DebugInfo/MSF/MappedBlockStream.h"
#include "llvm/DebugInfo/PDB/Native/GlobalsStream.h"		#include "llvm/DebugInfo/PDB/Native/GlobalsStream.h"
#include "llvm/DebugInfo/PDB/Native/Hash.h"		#include "llvm/DebugInfo/PDB/Native/Hash.h"
#include "llvm/Support/BinaryItemStream.h"		#include "llvm/Support/BinaryItemStream.h"
#include "llvm/Support/BinaryStreamWriter.h"		#include "llvm/Support/BinaryStreamWriter.h"
		#include "llvm/Support/Parallel.h"
#include "llvm/Support/xxhash.h"		#include "llvm/Support/xxhash.h"
#include <algorithm>		#include <algorithm>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
using namespace llvm::msf;		using namespace llvm::msf;
using namespace llvm::pdb;		using namespace llvm::pdb;
using namespace llvm::codeview;		using namespace llvm::codeview;
Show All 21 Lines	struct llvm::pdb::GSIHashStreamBuilder {
llvm::DenseSet<CVSymbol, SymbolDenseMapInfo> SymbolHashes;		llvm::DenseSet<CVSymbol, SymbolDenseMapInfo> SymbolHashes;
std::vector<PSHashRecord> HashRecords;		std::vector<PSHashRecord> HashRecords;
std::array<support::ulittle32_t, (IPHR_HASH + 32) / 32> HashBitmap;		std::array<support::ulittle32_t, (IPHR_HASH + 32) / 32> HashBitmap;
std::vector<support::ulittle32_t> HashBuckets;		std::vector<support::ulittle32_t> HashBuckets;

uint32_t calculateSerializedLength() const;		uint32_t calculateSerializedLength() const;
uint32_t calculateRecordByteSize() const;		uint32_t calculateRecordByteSize() const;
Error commit(BinaryStreamWriter &Writer);		Error commit(BinaryStreamWriter &Writer);

void finalizeBuckets(uint32_t RecordZeroOffset);		void finalizeBuckets(uint32_t RecordZeroOffset);

		// Finalize public symbol buckets.
		void finalizeBuckets(uint32_t RecordZeroOffset,
		std::vector<BulkPublic> &&Publics);
		aganeaUnsubmitted Done Reply Inline Actions Same thing here: `std::vector<BulkPublic> &&` aganea: Same thing here: `std::vector<BulkPublic> &&`

template <typename T> void addSymbol(const T &Symbol, MSFBuilder &Msf) {		template <typename T> void addSymbol(const T &Symbol, MSFBuilder &Msf) {
T Copy(Symbol);		T Copy(Symbol);
addSymbol(SymbolSerializer::writeOneSymbol(Copy, Msf.getAllocator(),		addSymbol(SymbolSerializer::writeOneSymbol(Copy, Msf.getAllocator(),
CodeViewContainer::Pdb));		CodeViewContainer::Pdb));
}		}
void addSymbol(const CVSymbol &Symbol) {		void addSymbol(const CVSymbol &Symbol);
		};

		void GSIHashStreamBuilder::addSymbol(const codeview::CVSymbol &Symbol) {
		// Ignore duplicate typedefs and constants.
if (Symbol.kind() == S_UDT \|\| Symbol.kind() == S_CONSTANT) {		if (Symbol.kind() == S_UDT \|\| Symbol.kind() == S_CONSTANT) {
auto Iter = SymbolHashes.insert(Symbol);		auto Iter = SymbolHashes.insert(Symbol);
if (!Iter.second)		if (!Iter.second)
return;		return;
}		}

Records.push_back(Symbol);		Records.push_back(Symbol);
}		}
};
		namespace {
		LLVM_PACKED(struct PublicSym32Layout {
		abhinavgabaUnsubmitted Not Done Reply Inline Actions This is causing the following warnings: ignoring packed attribute because of unpacked non-POD field ‘llvm::codeview::RecordPrefix {anonymous}::PublicSym32Layout::Prefix’ RecordPrefix Prefix; ^ ... ignoring packed attribute because of unpacked non-POD field ‘llvm::codeview::PublicSym32Header {anonymous}::PublicSym32Layout::Pub’ PublicSym32Header Pub; ^ abhinavgaba: This is causing the following warnings: ``` ignoring packed attribute because of unpacked non…
		uabelhoUnsubmitted Not Done Reply Inline Actions We see this too, we're using gcc 7.4.0 when compiling clang. We did some small experiments and the struct does get different sizes when compiled with clang vs gcc which might be a problem? With LLVM_PACKED_START/LLVM_PACKED_END around PublicSym32Layout instead of LLVM_PACKED the struct seems to get packed both with clang and gcc so perhaps that's a solution? uabelho: We see this too, we're using gcc 7.4.0 when compiling clang. We did some small experiments and…
		uabelhoUnsubmitted Not Done Reply Inline Actions Btw seems like someone already noticed that clang and gcc sometimes behaves differently regarding the packed attribute: https://bugs.llvm.org/show_bug.cgi?id=28571#c3 uabelho: Btw seems like someone already noticed that clang and gcc sometimes behaves differently…
		RecordPrefix Prefix;
		PublicSym32Header Pub;
		// char Name[];
		});
		}

		// Calculate how much memory this public needs when serialized.
		static uint32_t sizeOfPublic(const BulkPublic &Pub) {
		uint32_t NameLen = Pub.NameLen;
		NameLen = std::min(NameLen,
		uint32_t(MaxRecordLength - sizeof(PublicSym32Layout) - 1));
		return alignTo(sizeof(PublicSym32Layout) + NameLen + 1, 4);
		}

		static CVSymbol serializePublic(uint8_t *Mem, const BulkPublic &Pub) {
		// Assume the caller has allocated sizeOfPublic bytes.
		uint32_t NameLen = std::min(
		Pub.NameLen, uint32_t(MaxRecordLength - sizeof(PublicSym32Layout) - 1));
		size_t Size = alignTo(sizeof(PublicSym32Layout) + NameLen + 1, 4);
		assert(Size == sizeOfPublic(Pub));
		auto FixedMem = reinterpret_cast<PublicSym32Layout >(Mem);
		FixedMem->Prefix.RecordKind = static_cast<uint16_t>(codeview::S_PUB32);
		FixedMem->Prefix.RecordLen = static_cast<uint16_t>(Size - 2);
		FixedMem->Pub.Flags = Pub.Flags;
		FixedMem->Pub.Offset = Pub.Offset;
		FixedMem->Pub.Segment = Pub.U.Segment;
		char NameMem = reinterpret_cast<char >(FixedMem + 1);
		memcpy(NameMem, Pub.Name, NameLen);
		// Zero the null terminator and remaining bytes.
		memset(&NameMem[NameLen], 0, Size - sizeof(PublicSym32Layout) - NameLen);
		rnkAuthorUnsubmitted Done Reply Inline Actions Speaking of undertested things, these might need to be LF_PAD bytes. I will double check. rnk: Speaking of undertested things, these might need to be LF_PAD bytes. I will double check.
		return CVSymbol(makeArrayRef(reinterpret_cast<uint8_t *>(Mem), Size));
		}

uint32_t GSIHashStreamBuilder::calculateSerializedLength() const {		uint32_t GSIHashStreamBuilder::calculateSerializedLength() const {
uint32_t Size = sizeof(GSIHashHeader);		uint32_t Size = sizeof(GSIHashHeader);
Size += HashRecords.size() * sizeof(PSHashRecord);		Size += HashRecords.size() * sizeof(PSHashRecord);
Size += HashBitmap.size() * sizeof(uint32_t);		Size += HashBitmap.size() * sizeof(uint32_t);
Size += HashBuckets.size() * sizeof(uint32_t);		Size += HashBuckets.size() * sizeof(uint32_t);
return Size;		return Size;
}		}
Show All 24 Lines	Error GSIHashStreamBuilder::commit(BinaryStreamWriter &Writer) {
return Error::success();		return Error::success();
}		}

static bool isAsciiString(StringRef S) {		static bool isAsciiString(StringRef S) {
return llvm::all_of(S, [](char C) { return unsigned(C) < 0x80; });		return llvm::all_of(S, [](char C) { return unsigned(C) < 0x80; });
}		}

// See `caseInsensitiveComparePchPchCchCch` in gsi.cpp		// See `caseInsensitiveComparePchPchCchCch` in gsi.cpp
static bool gsiRecordLess(StringRef S1, StringRef S2) {		static int gsiRecordCmp(StringRef S1, StringRef S2) {
size_t LS = S1.size();		size_t LS = S1.size();
size_t RS = S2.size();		size_t RS = S2.size();
// Shorter strings always compare less than longer strings.		// Shorter strings always compare less than longer strings.
if (LS != RS)		if (LS != RS)
return LS < RS;		return LS - RS;

// If either string contains non ascii characters, memcmp them.		// If either string contains non ascii characters, memcmp them.
if (LLVM_UNLIKELY(!isAsciiString(S1) \|\| !isAsciiString(S2)))		if (LLVM_UNLIKELY(!isAsciiString(S1) \|\| !isAsciiString(S2)))
return memcmp(S1.data(), S2.data(), LS) < 0;		return memcmp(S1.data(), S2.data(), LS);

// Both strings are ascii, perform a case-insenstive comparison.		// Both strings are ascii, perform a case-insenstive comparison.
return S1.compare_lower(S2.data()) < 0;		return S1.compare_lower(S2.data());
}		}

void GSIHashStreamBuilder::finalizeBuckets(uint32_t RecordZeroOffset) {		void GSIHashStreamBuilder::finalizeBuckets(uint32_t RecordZeroOffset) {
std::array<std::vector<std::pair<StringRef, PSHashRecord>>, IPHR_HASH + 1>		// Build up a list of globals to be bucketed. This repurposes the BulkPublic
TmpBuckets;		// struct with different meanings for the fields to avoid reallocating a new
		// vector during public symbol table hash construction.
		std::vector<BulkPublic> Globals;
		Globals.resize(Records.size());
uint32_t SymOffset = RecordZeroOffset;		uint32_t SymOffset = RecordZeroOffset;
for (const CVSymbol &Sym : Records) {		for (size_t I = 0, E = Records.size(); I < E; ++I) {
PSHashRecord HR;		StringRef Name = getSymbolName(Records[I]);
// Add one when writing symbol offsets to disk. See GSI1::fixSymRecs.		Globals[I].Name = Name.data();
HR.Off = SymOffset + 1;		Globals[I].NameLen = Name.size();
HR.CRef = 1; // Always use a refcount of 1.		Globals[I].SymOffset = SymOffset;
		SymOffset += Records[I].length();
// Hash the name to figure out which bucket this goes into.		}
StringRef Name = getSymbolName(Sym);
size_t BucketIdx = hashStringV1(Name) % IPHR_HASH;		finalizeBuckets(RecordZeroOffset, std::move(Globals));
TmpBuckets[BucketIdx].push_back(std::make_pair(Name, HR));		}
SymOffset += Sym.length();
}		void GSIHashStreamBuilder::finalizeBuckets(uint32_t RecordZeroOffset,
		std::vector<BulkPublic> &&Globals) {
		aganeaUnsubmitted Done Reply Inline Actions std::vector<BulkPublic> &&Globals aganea: std::vector<BulkPublic> &&Globals
		// Hash every name in parallel. The Segment field is no longer needed, so
		// store the BucketIdx in a union.
		parallelForEachN(0, Globals.size(), [&](size_t I) {
		aganeaUnsubmitted Not Done Reply Inline Actions Nice! I am wondering if we shouldn't do `union { uint16_t Segment; uint16_t BucketIdx; };` to make things a bit more explicit. Up to you! aganea: Nice! I am wondering if we shouldn't do `union { uint16_t Segment; uint16_t BucketIdx; };` to…
		rnkAuthorUnsubmitted Done Reply Inline Actions I'll do it. I went the other way while working on the code because the struct is declared in a header, so it affects the PDB.cpp code. We also can't use a nameless union field because I believe that is an extension. The zero initialization also becomes complicated. rnk: I'll do it. I went the other way while working on the code because the struct is declared in a…
		Globals[I].U.BucketIdx = hashStringV1(Globals[I].Name) % IPHR_HASH;
		});

// Compute the three tables: the hash records in bucket and chain order, the		// Parallel sort by bucket index, then name within the buckets. Within the
// bucket presence bitmap, and the bucket chain start offsets.		// buckets, sort each bucket by memcmp of the symbol's name. It's important
HashRecords.reserve(Records.size());		// that we use the same sorting algorithm as is used by the reference
		// implementation to ensure that the search for a record within a bucket can
		// properly early-out when it detects the record won't be found. The
		// algorithm used here corredsponds to the function
		// caseInsensitiveComparePchPchCchCch in the reference implementation.
		aganeaUnsubmitted Not Done Reply Inline Actions Maybe provide a link to the reference implemention repo? (ie.https://github.com/microsoft/microsoft-pdb) aganea: Maybe provide a link to the reference implemention repo? (ie.https://github.
		rnkAuthorUnsubmitted Done Reply Inline Actions I put one in the file header comment like the other files in here. rnk: I put one in the file header comment like the other files in here.
		auto BucketCmp = [](const BulkPublic &L, const BulkPublic &R) {
		if (L.U.BucketIdx != R.U.BucketIdx)
		aganeaUnsubmitted Done Reply Inline Actions ...and in that case, do we need these two assignements? if (L.BucketIdx != R.BucketIdx) return L.BucketIdx < R.BucketIdx; aganea: ...and in that case, do we need these two assignements? ``` if (L.BucketIdx != R.BucketIdx)…
		return L.U.BucketIdx < R.U.BucketIdx;
		int Cmp = gsiRecordCmp(L.getName(), R.getName());
		if (Cmp != 0)
		return Cmp < 0;
		// This comparison is necessary to make the sorting stable in the presence
		// of two static globals with the same name. The easiest way to observe
		// this is with S_LDATA32 records.
		return L.SymOffset < R.SymOffset;
		};
		parallelSort(Globals, BucketCmp);

		// Zero out the bucket index bitmap.
		aganeaUnsubmitted Done Reply Inline Actions finalizeBuckets(RecordZeroOffset, std::move(Globals)) aganea: finalizeBuckets(RecordZeroOffset, std::move(Globals))
for (ulittle32_t &Word : HashBitmap)		for (ulittle32_t &Word : HashBitmap)
Word = 0;		Word = 0;
for (size_t BucketIdx = 0; BucketIdx < IPHR_HASH + 1; ++BucketIdx) {
auto &Bucket = TmpBuckets[BucketIdx];		// Compute the three tables: the hash records in bucket and chain order, the
if (Bucket.empty())		// bucket presence bitmap, and the bucket chain start offsets.
continue;		HashRecords.reserve(Globals.size());
		uint32_t LastBucketIdx = ~0U;
		for (const BulkPublic &Global : Globals) {
		// If this is a new bucket, add it to the bitmap and the start offset map.
		uint32_t BucketIdx = Global.U.BucketIdx;
		if (LastBucketIdx != BucketIdx) {
HashBitmap[BucketIdx / 32] \|= 1U << (BucketIdx % 32);		HashBitmap[BucketIdx / 32] \|= 1U << (BucketIdx % 32);

// Calculate what the offset of the first hash record in the chain would		// Calculate what the offset of the first hash record in the chain would
// be if it were inflated to contain 32-bit pointers. On a 32-bit system,		// be if it were inflated to contain 32-bit pointers. On a 32-bit system,
// each record would be 12 bytes. See HROffsetCalc in gsi.h.		// each record would be 12 bytes. See HROffsetCalc in gsi.h.
const int SizeOfHROffsetCalc = 12;		const int SizeOfHROffsetCalc = 12;
ulittle32_t ChainStartOff =		ulittle32_t ChainStartOff =
ulittle32_t(HashRecords.size() * SizeOfHROffsetCalc);		ulittle32_t(HashRecords.size() * SizeOfHROffsetCalc);
HashBuckets.push_back(ChainStartOff);		HashBuckets.push_back(ChainStartOff);
		LastBucketIdx = BucketIdx;
		}

// Sort each bucket by memcmp of the symbol's name. It's important that		// Create the hash record. Add one when writing symbol offsets to disk.
// we use the same sorting algorithm as is used by the reference		// See GSI1::fixSymRecs. Always use a refcount of 1 for now.
// implementation to ensure that the search for a record within a bucket		PSHashRecord HRec;
// can properly early-out when it detects the record won't be found. The		HRec.Off = Global.SymOffset + 1;
// algorithm used here corredsponds to the function		HRec.CRef = 1;
// caseInsensitiveComparePchPchCchCch in the reference implementation.		HashRecords.push_back(HRec);
llvm::sort(Bucket, [](const std::pair<StringRef, PSHashRecord> &Left,
const std::pair<StringRef, PSHashRecord> &Right) {
return gsiRecordLess(Left.first, Right.first);
});

for (const auto &Entry : Bucket)
HashRecords.push_back(Entry.second);
}		}
}		}

GSIStreamBuilder::GSIStreamBuilder(msf::MSFBuilder &Msf)		GSIStreamBuilder::GSIStreamBuilder(msf::MSFBuilder &Msf)
: Msf(Msf), PSH(std::make_unique<GSIHashStreamBuilder>()),		: Msf(Msf), PSH(std::make_unique<GSIHashStreamBuilder>()),
GSH(std::make_unique<GSIHashStreamBuilder>()) {}		GSH(std::make_unique<GSIHashStreamBuilder>()) {}

GSIStreamBuilder::~GSIStreamBuilder() {}		GSIStreamBuilder::~GSIStreamBuilder() {}

uint32_t GSIStreamBuilder::calculatePublicsHashStreamSize() const {		uint32_t GSIStreamBuilder::calculatePublicsHashStreamSize() const {
uint32_t Size = 0;		uint32_t Size = 0;
Size += sizeof(PublicsStreamHeader);		Size += sizeof(PublicsStreamHeader);
Size += PSH->calculateSerializedLength();		Size += PSH->calculateSerializedLength();
Size += PSH->Records.size() * sizeof(uint32_t); // AddrMap		Size += PubAddrMap.size() * sizeof(uint32_t); // AddrMap
// FIXME: Add thunk map and section offsets for incremental linking.		// FIXME: Add thunk map and section offsets for incremental linking.

return Size;		return Size;
}		}

uint32_t GSIStreamBuilder::calculateGlobalsHashStreamSize() const {		uint32_t GSIStreamBuilder::calculateGlobalsHashStreamSize() const {
return GSH->calculateSerializedLength();		return GSH->calculateSerializedLength();
}		}

Error GSIStreamBuilder::finalizeMsfLayout() {		Error GSIStreamBuilder::finalizeMsfLayout() {
// First we write public symbol records, then we write global symbol records.		// First we write public symbol records, then we write global symbol records.
uint32_t PSHZero = 0;		uint32_t PublicsSize = PSH->calculateRecordByteSize();
uint32_t GSHZero = PSH->calculateRecordByteSize();		uint32_t GlobalsSize = GSH->calculateRecordByteSize();
		GSH->finalizeBuckets(PublicsSize);
PSH->finalizeBuckets(PSHZero);
GSH->finalizeBuckets(GSHZero);

Expected<uint32_t> Idx = Msf.addStream(calculateGlobalsHashStreamSize());		Expected<uint32_t> Idx = Msf.addStream(calculateGlobalsHashStreamSize());
if (!Idx)		if (!Idx)
return Idx.takeError();		return Idx.takeError();
GlobalsStreamIndex = *Idx;		GlobalsStreamIndex = *Idx;

Idx = Msf.addStream(calculatePublicsHashStreamSize());		Idx = Msf.addStream(calculatePublicsHashStreamSize());
if (!Idx)		if (!Idx)
return Idx.takeError();		return Idx.takeError();
PublicsStreamIndex = *Idx;		PublicsStreamIndex = *Idx;

uint32_t RecordBytes =		uint32_t RecordBytes = PublicsSize + GlobalsSize;
GSH->calculateRecordByteSize() + PSH->calculateRecordByteSize();

Idx = Msf.addStream(RecordBytes);		Idx = Msf.addStream(RecordBytes);
if (!Idx)		if (!Idx)
return Idx.takeError();		return Idx.takeError();
RecordStreamIndex = *Idx;		RecordStreamIndex = *Idx;
return Error::success();		return Error::success();
}		}

static StringRef extractPubSym(const CVSymbol *Sym, uint16_t &Seg,		void GSIStreamBuilder::addPublicSymbols(std::vector<BulkPublic> &&Publics) {
uint32_t &Offset) {		// Sort the symbols by name. PDBs contain lots of symbols, so use parallelism.
ArrayRef<uint8_t> Buf = Sym->content();		parallelSort(Publics, [](const BulkPublic &L, const BulkPublic &R) {
assert(Buf.size() > sizeof(PublicSym32Header));		return L.getName() < R.getName();
const auto Hdr = reinterpret_cast<const PublicSym32Header >(Buf.data());		});
Buf = Buf.drop_front(sizeof(PublicSym32Header));
Seg = Hdr->Segment;
Offset = Hdr->Offset;
// Don't worry about finding the null terminator, since the strings will be
// compared later.
return StringRef(reinterpret_cast<const char *>(Buf.data()), Buf.size());
}

static bool comparePubSymByAddrAndName(const CVSymbol LS, const CVSymbol RS) {
uint16_t LSeg, RSeg;
uint32_t LOff, ROff;
StringRef LName, RName;
LName = extractPubSym(LS, LSeg, LOff);
RName = extractPubSym(RS, RSeg, ROff);
if (LSeg != RSeg)
return LSeg < RSeg;
if (LOff != ROff)
return LOff < ROff;
return LName < RName;
}

/// Compute the address map. The address map is an array of symbol offsets
/// sorted so that it can be binary searched by address.
static std::vector<ulittle32_t> computeAddrMap(ArrayRef<CVSymbol> Records) {
// Make a vector of pointers to the symbols so we can sort it by address.
// Also gather the symbol offsets while we're at it.

std::vector<const CVSymbol *> PublicsByAddr;
std::vector<uint32_t> SymOffsets;
PublicsByAddr.reserve(Records.size());
SymOffsets.reserve(Records.size());

		// Assign offsets and allocate one contiguous block of memory for all public
		// symbols.
uint32_t SymOffset = 0;		uint32_t SymOffset = 0;
for (const CVSymbol &Sym : Records) {		for (BulkPublic &Pub : Publics) {
assert(Sym.kind() == SymbolKind::S_PUB32);		Pub.SymOffset = SymOffset;
PublicsByAddr.push_back(&Sym);		SymOffset += sizeOfPublic(Pub);
SymOffsets.push_back(SymOffset);		}
SymOffset += Sym.length();		uint8_t *Mem =
}		reinterpret_cast<uint8_t *>(Msf.getAllocator().Allocate(SymOffset, 4));
llvm::stable_sort(PublicsByAddr, comparePubSymByAddrAndName);
		// Instead of storing individual CVSymbol records, store them as one giant
		// buffer.
		// FIXME: This is kind of a hack. This makes Records.size() wrong, and we have
		// to account for that elsewhere.
		PSH->Records.push_back(CVSymbol(makeArrayRef(Mem, SymOffset)));

		// Serialize them in parallel.
		parallelForEachN(0, Publics.size(), [&](size_t I) {
		const BulkPublic &Pub = Publics[I];
		serializePublic(Mem + Pub.SymOffset, Pub);
		});

// Fill in the symbol offsets in the appropriate order.		// Re-sort the publics by address so we can build the address map. We no
std::vector<ulittle32_t> AddrMap;		// longer need the original ordering.
AddrMap.reserve(Records.size());		auto AddrCmp = [](const BulkPublic &L, const BulkPublic &R) {
for (const CVSymbol *Sym : PublicsByAddr) {		if (L.U.Segment != R.U.Segment)
ptrdiff_t Idx = std::distance(Records.data(), Sym);		return L.U.Segment < R.U.Segment;
assert(Idx >= 0 && size_t(Idx) < Records.size());		if (L.Offset != R.Offset)
AddrMap.push_back(ulittle32_t(SymOffsets[Idx]));		return L.Offset < R.Offset;
}		// parallelSort is unstable, so we have to do name comparison to ensure
		aganeaUnsubmitted Done Reply Inline Actions // parallelSort is unstable, so we have to do name aganea: // parallelSort is unstable, so we have to do name
return AddrMap;		// that two names for the same location come out in a determinstic order.
}		return L.getName() < R.getName();
		};
		parallelSort(Publics, AddrCmp);

void GSIStreamBuilder::addPublicSymbol(const PublicSym32 &Pub) {		// Fill in the symbol offsets in the appropriate order.
PSH->addSymbol(Pub, Msf);		PubAddrMap.reserve(Publics.size());
		for (const BulkPublic &Pub : Publics)
		PubAddrMap.push_back(ulittle32_t(Pub.SymOffset));
		aganeaUnsubmitted Not Done Reply Inline Actions Do you think we can avoid `.push_back` altogether, and simplify to: PubAddrMap.resize(Publics.size()); ulittle32_t Next = &PubAddrMap.front(); for (const BulkPublic &Pub : Publics) Next++ = Pub.SymOffset; aganea: Do you think we can avoid `.push_back` altogether, and simplify to: ``` PubAddrMap.resize…
		rnkAuthorUnsubmitted Done Reply Inline Actions I think this is better because it avoids the zero-initialization of resize. This loop doesn't show up in the profile, FWIW. rnk: I think this is better because it avoids the zero-initialization of resize. This loop doesn't…

		// Finalize public symbol buckets immediately after they have been added.
		// They should all be warm in the cache at this point, so go ahead and do it
		// now.
		PSH->finalizeBuckets(0, std::move(Publics));
}		}

void GSIStreamBuilder::addGlobalSymbol(const ProcRefSym &Sym) {		void GSIStreamBuilder::addGlobalSymbol(const ProcRefSym &Sym) {
GSH->addSymbol(Sym, Msf);		GSH->addSymbol(Sym, Msf);
}		}

void GSIStreamBuilder::addGlobalSymbol(const DataSym &Sym) {		void GSIStreamBuilder::addGlobalSymbol(const DataSym &Sym) {
GSH->addSymbol(Sym, Msf);		GSH->addSymbol(Sym, Msf);
Show All 32 Lines

Error GSIStreamBuilder::commitPublicsHashStream(		Error GSIStreamBuilder::commitPublicsHashStream(
WritableBinaryStreamRef Stream) {		WritableBinaryStreamRef Stream) {
BinaryStreamWriter Writer(Stream);		BinaryStreamWriter Writer(Stream);
PublicsStreamHeader Header;		PublicsStreamHeader Header;

// FIXME: Fill these in. They are for incremental linking.		// FIXME: Fill these in. They are for incremental linking.
Header.SymHash = PSH->calculateSerializedLength();		Header.SymHash = PSH->calculateSerializedLength();
Header.AddrMap = PSH->Records.size() * 4;		Header.AddrMap = PubAddrMap.size() * 4;
Header.NumThunks = 0;		Header.NumThunks = 0;
Header.SizeOfThunk = 0;		Header.SizeOfThunk = 0;
Header.ISectThunkTable = 0;		Header.ISectThunkTable = 0;
memset(Header.Padding, 0, sizeof(Header.Padding));		memset(Header.Padding, 0, sizeof(Header.Padding));
Header.OffThunkTable = 0;		Header.OffThunkTable = 0;
Header.NumSections = 0;		Header.NumSections = 0;
if (auto EC = Writer.writeObject(Header))		if (auto EC = Writer.writeObject(Header))
return EC;		return EC;

if (auto EC = PSH->commit(Writer))		if (auto EC = PSH->commit(Writer))
return EC;		return EC;

std::vector<ulittle32_t> AddrMap = computeAddrMap(PSH->Records);		if (auto EC = Writer.writeArray(makeArrayRef(PubAddrMap)))
if (auto EC = Writer.writeArray(makeArrayRef(AddrMap)))
return EC;		return EC;

return Error::success();		return Error::success();
}		}

Error GSIStreamBuilder::commitGlobalsHashStream(		Error GSIStreamBuilder::commitGlobalsHashStream(
WritableBinaryStreamRef Stream) {		WritableBinaryStreamRef Stream) {
BinaryStreamWriter Writer(Stream);		BinaryStreamWriter Writer(Stream);
Show All 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Optimize public symbol processingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262897

lld/COFF/PDB.cpp

llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp

[PDB] Optimize public symbol processing
ClosedPublic