Page MenuHomePhabricator

[PDB] Make LLD PDBs look a little more like Microsoft PDBs

Authored by zturner on Mar 22 2018, 3:45 PM.



This effort came about when trying to diagnose some problems with LLD PDBs that would cause certain debugging features to not work. It was fairly difficult to investigate why our PDB was behaving differently because there were a lot of differences in the bytes we emitted, most of which were meaningless.

For example, we would output stream A to stream 7 then stream B to stream 12, then construct the name stream map which says Stream A is at 7 and B is at 12.

They would output stream A to stream 15 and stream B to stream 3, then construct the name stream map which says Stream B is at 3 and stream A is at 15.

Since there is a hash table, and buckets, and values in the buckets, and the order of insertion matters, you could end up looking at 2 completely different sequences of bytes that were both ultimately correct.

Issues like this make looking for *important* byte-level differences almost impossible, and it wasn't until I spent considerable time eliminating these superfluous byte level differences that I was ultimately able to find the real problem.

So, the idea is that these changes should aid future investigations while being NFC. Note that none of these changes are necessary for correctness, they are just good to have anyway.

Diff Detail

rLLD LLVM Linker

Event Timeline

zturner created this revision.Mar 22 2018, 3:45 PM
rnk accepted this revision.Mar 22 2018, 3:53 PM


66 ↗(On Diff #139526)

Are you sure you don't want to check the optional value and do the old error path? Honestly, this is an internal error, we could assert.

This revision is now accepted and ready to land.Mar 22 2018, 3:53 PM
majnemer added inline comments.
41–47 ↗(On Diff #139526)

This pseudo-code infinite loops. What's the termination condition?

zturner added inline comments.Mar 22 2018, 7:24 PM
41–47 ↗(On Diff #139526)

The body of the loop is basically the "insert" function of the hash table, so it's trying to illustrate that on each insert, they increment the count, and if the load factor goes over 75% they multiply the number of buckets by 1.5 and rehash.

If there's a way to write the pseudocode that's more clear I'm happy to change it.

What about this:

// Initial conditions
int BucketCount = 1;
int StringCount = 0;

// Called each time a new string is inserted.
void onStringInserted() {
  if (BucketCount * 3 / 4 < StringCount)
    BucketCount = BucketCount * 3 / 2 + 1;
majnemer added inline comments.Mar 22 2018, 8:05 PM
41–47 ↗(On Diff #139526)


However, I think the sign has some interesting implications.

If StringCount was 49 (with BucketCount 1034394550), this code will overflow:
1034394550 * 3 is -1191783646 which will be less than StringCount (now 50).

Your version will end up computing the mathematically correct result of 1551591826.

48–49 ↗(On Diff #139526)

a006999 53 is 3491081611
a006999 54 is 5236622417, this number is too big to fit in a uint32_t. This would end up computing (uint32_t)5236622417 which is 941655121.

zturner added inline comments.Mar 22 2018, 9:01 PM
41–47 ↗(On Diff #139526)

I'll have to check with the MS source code tomorrow and see if theirs overflows. Perhaps they're using a uint64 and I missed that.

This revision was automatically updated to reflect the committed changes.