This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Emit S_UDT records in LLD
ClosedPublic

Authored by zturner on Dec 3 2018, 11:37 AM.

Details

Summary

This patch emits S_UDT records from LLD. These represent typedefs. The compiler emits these into the .debug$S section, and the linker must split these up between the PDB globals stream and PDB module stream, depending on whether it is a global typedef or a function local typedef.

Note there is a problem here, which is that we emit local typedefs using their fully qualified name, so we append the function name to the beginning. This is obviously wrong, and it is a simple fix, however it's orthogonal to this change so I'll submit that patch to the compiler in a followup.

Diff Detail

Event Timeline

zturner created this revision.Dec 3 2018, 11:37 AM
zturner updated this revision to Diff 176441.Dec 3 2018, 11:43 AM

Cleanup the test a little bit. Also added a CHECK-NOT: S_END line to make sure that the typedef we emit actually is inside the proper scope.

zturner updated this revision to Diff 176442.Dec 3 2018, 11:44 AM

Update the diff for real.

rnk added inline comments.Dec 3 2018, 12:04 PM
lld/test/COFF/s_udt.test
2

Any particular reason for this to be an IR test that uses llc? LTO isn't involved, so you could either do a yaml input or an assembly file input. Which ever is more readable.

If you do want an IR test, can we name it .ll? We should be able to add .ll to the list of test case suffixes if it's not in there already.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
52–59

I don't think we should assume xxHash64 doesn't have collisions. The space of 64-bit hashes is big, but it's not a content hash like SHA1, so it should be possible to construct some collisions. Someone has code to do it here: https://github.com/Cyan4973/xxHash/issues/54

If you don't want to add the collision resolution code because it's slow, at least try linking browser_tests with extra instrumentation code to detect collisions. If you find a collision, then we should keep the resolution code and add that as a test case. If you don't find a collision, then at least put a large comment here documenting our decision to sometimes drop typedefs with matching xxHash64 values.

Also, you might want to structure this code in such a way that it can be extended for S_CONSTANT, which also needs deduplication: https://bugs.llvm.org/show_bug.cgi?id=37933

zturner marked an inline comment as done.Dec 3 2018, 12:08 PM
zturner added inline comments.
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
52–59

The thing about collisions is that the consequences of a collision are that we end up with a few extra bytes in the PDB. So it seems unimportant to worry about a collision, even in the case that we do have collisions.

aganea added inline comments.Dec 3 2018, 12:12 PM
lld/test/COFF/s_udt.test
35

Could you possibly add here the source code that generated this test?

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
48

Could someone insert twice the same S_UDT record by calling GSIStreamBuilder::addGlobalSymbol(const UDTSym &Sym)? Replace Records.push_back by a call to addSymbol in that case.

aganea added inline comments.Dec 3 2018, 1:54 PM
llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
52–59

Unless I'm missing something, @rnk is right: the PDB will not receive a colliding record. If you're inserting a S_UDT 'typedef char B' with hash 0x2, and a previous 'typedef int A' with same hash 0x2 was already there in UdtHashes, the new record 'B' will not make it to Records.push_back().

Yea he's right, I was just having a brain fail when I wrote that. I'm updating the code to correctly handle collisions. I don't think we need to do anything special for S_CONSTANT though, because the DenseSet<> is just holding hash values of generic CVSymbol records, which could be any kind of symbol, including S_CONSTANT. With a record-kind-agnostic hash set and the ability to handle collisions, we just need to change the if statement to a switch statement and we can handle all the types we want.

zturner updated this revision to Diff 176680.Dec 4 2018, 11:17 AM

Changed the test to an llvm assembly test, and properly handle collisions. I actually added some instrumentation to detect if any collisions were actually occurring, and I did not find any with a PDB containing 450,000 S_UDT records. On the other hand, I did not measure a performance impact from adding this, so I'm adding it anyway. We can revisit later if it shows up on a profile.

Note that the test changed a little bit. For some reason, when I use an IR test and llc it generates different assembly than if I use an assembly test with llvm-mc. I don't know if this is normal, but throwing it out there just in case.

aganea added inline comments.Dec 4 2018, 11:47 AM
lld/test/COFF/s_udt.s
20 ↗(On Diff #176680)

Just to be sure, 0x1003 is here because of the yet unlanded D55236? It should read 0x1006 once it lands? You would need to re-generate this test at that point?

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
46

Since you removed GSIStreamBuilder::addGlobalSymbol below, do we still need this function?

zturner marked 2 inline comments as done.Dec 4 2018, 11:52 AM
zturner added inline comments.
lld/test/COFF/s_udt.s
20 ↗(On Diff #176680)

Yes, whoever lands second would need to update their patch.

llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp
46

I only removed the overload that takes a UDTSym (it was never called), but there are still other overloads. Once we extend the hash lookup to include types other than S_UDT, we can evaluate the other methods. I could also probably just change the push_back to an addSymbol since there's no reason not to. I'll do that offline before I submit.

aganea accepted this revision.Dec 4 2018, 11:58 AM

Thank you!

This revision is now accepted and ready to land.Dec 4 2018, 11:58 AM
This revision was automatically updated to reflect the committed changes.