This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][PDB] Fix too many FPM blocks being written in some cases
AbandonedPublic

Authored by colden on Jan 4 2018, 10:37 AM.

Details

Reviewers
zturner
Summary

The current interval counting math includes the two following issues:

  • The Super Block (and Fpm0 when using Fpm1) are included in the block count, even though they don't have associated free page maps
  • When calculating the number of existing free page maps, we should divide by block size + 1, to include the FPMs associated with the blocks

This error only occurs when the total number of blocks present is within 1 (up or down) of a multiple of 4096, so it's pretty rare. Mine was hitting right at 12289.

I'm 90% sure this is the correct fix for this, but if anyone can point out a better place I'm happy to poke around some more.

Diff Detail

Repository
rL LLVM

Event Timeline

colden created this revision.Jan 4 2018, 10:37 AM

Out of curiosity, what was the symptom you were experiencing and how did you come to the conclusion that it was related to the FPM calculations?

I'll have to go back to the microsoft repo to re-read their algorithm and see if I can find any hints about how they factor in these values.

This actually took me 3-4 days to track down, but I'll try to explain as clearly as I can :)

The actual crash was in WritableMappedBlockStream::writeBytes (MappedBlockStream.cpp), the write on line 408 was failing because we were writing past the end of the buffer. The call to write the bytes came from WritableMappedBlockStream::createFpmStream, on line 370. getFpmStreamLayout had reported the number of intervals as 4 instead of 3, so we were attempting to write an extra FPM block past the end of the buffer.

If my understanding of the format is correct, the file should looks something like this:
\+-------------+-------+------+---------------+-----+---------------+
\ | Super Block | Fpm0 | Fpm1 | 4096 blocks... | Fpm | 4096 blocks... |
\+-------------+-------+------+---------------+-----+---------------+
Number of intervals should be the total number of blocks, minus reserved/non-in-use-fpm blocks (so 1 if Fpm0, 2 if Fpm1), divided by the total number of blocks per interval, which should be 4097: 1 for the leading Fpm block, and 4096 for the actual blocks.

I will admit I haven't looked at the MS algorithms, but from the LLVM PDB MSF docs page, this is what I came up with.

zturner added a comment.EditedJan 4 2018, 11:25 AM

So here's the relevant code:

https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/msf/msf.cpp#L2508-L2516

CB cbpg = cbPg();    // (1)

// Calc number of bytes needed to represent FPM, aligned to FPM::native_word sizes
//
CB cbFpm = (pnMac() + CHAR_BIT - 1) / CHAR_BIT;   // (2)
cbFpm = sizeof(FPM::native_word) * ((cbFpm + sizeof(FPM::native_word) - 1) / sizeof(FPM::native_word));   // (3)

// Calc number of pages needed to represent that number of bytes
UPN cpnFpm = (cbFpm + cbpg - 1) / cbpg;  // (4)

The way I read this code:

At (1), cbpg = 4096, the number of bytes in a page (always hardcoded to 4096 since we only support what they refer to as "BigMsf")
At (2), cbFpm = ceil(# of pages / 8) Conceptually, this is since each byte refers to 8 pages, so this is really "# of bytes of fpm data needed given the number of pages we need to write"
At (3), cbFpm = # of bytes of fpm data needed rounded up to a multiple of pointer size
At (4), cpnFpm = # of bytes of fpm data needed rounded up to a multiple of pointer size and then rounded up to a multiple of block size

rounding up to a multiple of pointer size and *then* rounding up to a multiple of block size seems to be redundant, so I interpret this as just:

  • # of bytes of fpm data needed rounded up to a multiple of block size

But the question I kind of skipped over is "What is # of pages? Does it include the FPM and super block." I believe it does, For example, I ran llvm-pdbutil against clang.pdb which was generated by MSVC, and I get this:

D:\src\llvmbuild\cl\Debug\x64>bin\llvm-pdbutil.exe dump --summary bin\clang.pdb


                          Summary
============================================================
  Block Size: 4096
  Number of blocks: 167547
  Number of streams: 1442
  Signature: 1514918418
  Age: 3
  GUID: {D1F84556-D25E-7045-BF75-5D4A1D9274AB}
  Features: 0x1
  Has Debug Info: true
  Has Types: true
  Has IDs: true
  Has Globals: true
  Has Publics: true
  Is incrementally linked: true
  Has conflicting types: false
  Is stripped: false

D:\src\llvmbuild\cl\Debug\x64>dir bin\clang.pdb
 Volume in drive D is New Volume
 Volume Serial Number is 2EC5-6F01

 Directory of D:\src\llvmbuild\cl\Debug\x64\bin

01/03/2018  10:57 AM       686,272,512 clang.pdb
               1 File(s)    686,272,512 bytes
               0 Dir(s)   4,718,669,824 bytes free

We have to assume this PDB is authoritative since it was generated by MSVC, so whatever it does it correct by definition.

686272512 / 4096 = 167547, which matches exactly the number of blocks in the MSF header.

So based on this analysis, it looks like:

  1. The "# of pages" field in the MSF header definitely includes the first 3 reserved pages.
  2. The calculation currently used matches the calculation in the reference implementation.

So either my analysis is wrong (very likely, it's always hard analyzing this kind of dense code when we can't step through it in a debugger or even run the real code), or the bug lies elsewhere (perhaps in the write function?).

Interested to hear your thoughts though since you've already spent much more time thinking about this then I have :)

At (4), cpnFpm = # of bytes of fpm data needed rounded up to a multiple of pointer size and then rounded up to a multiple of block size

rounding up to a multiple of pointer size and *then* rounding up to a multiple of block size seems to be redundant, so I interpret this as just:

  • # of bytes of fpm data needed rounded up to a multiple of block size

Slight correction. cpnFpm is the # of pages needed to represent cbFpm bytes., which should be equivalent to divideCeil(NumBlocks, BlockSize * 8);

Have to think about this more though, it's possible I'm wrong.

zturner added a comment.EditedJan 4 2018, 11:54 AM

So this is interesting:

https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/msf/msf.cpp#L1663-L1695

Especially the part about "Mark all non-special pages free". But then it just says setAll(); Maybe this backs up your theory that the first 3 pages are "special" and so they are always allocated and excluded from the FPM entirely.

Would this mean that the 8 bits in the first byte of the first page of the FPM actually refer to pages [3,11) in the underlying MSF file?

So this is interesting:

https://github.com/Microsoft/microsoft-pdb/blob/master/PDB/msf/msf.cpp#L1663-L1695

Especially the part about "Mark all non-special pages free". But then it just says setAll(); Maybe this backs up your theory that the first 3 pages are "special" and so they are always allocated and excluded from the FPM entirely.

Would this mean that the 8 bits in the first byte of the first page of the FPM actually refer to pages [3,11) in the underlying MSF file?

Ok, now I definitely think the first 3 pages are included in the FPM. I'm looking at this code:

This code gets run upon initialization of a new MSF.

for (UPN pn = 0; pn < msfparms.pnDataMin; pn++) {
    if (fpm.nextPn() != pn)
        assert(FALSE);
}

here, pnDataMin == 3, and fpm.nextPn() basically means "mark this page allocated in the bitmap". So this code translates to:

fpm.setAllocated(0);
fpm.setAllocated(1);
fpm.setAllocated(2);

So the FPM is definitely recording information about these 3 pages.

I'm going to inspect the code that crashed in WritableMappedBlockStream::writeBytes and see if maybe there's something there.

Is there a reliable way to repro this crash?

colden added a comment.Jan 4 2018, 1:19 PM

In the process of writing this comment, I rewrote it like 8 times whenever I discovered something. So sorry if this reads disjointly :)

I'm 99% sure the NumBlocks includes the special blocks at the beginning, and the in-stride FPMs. I think this is what's actually causing the math in getNumFpmIntervals to be incorrect. What I still do not know is what transformations actually need to be applied, and to where.

After playing around with numbers, I do agree that the first FPM includes the 3 special blocks. Glad to see the MS code corroborates that.

My new theory is the in-stride FPMs include themselves. The first bit of the in-stride FPMs is actually just for itself. This is supported by the current coding placing the next FPM exactly 4096 blocks after the previous. This meshes well with the first-fpm-includes-special-blocks theory. Then the layout would be something like this:

0123-409640974098-819281938194-12289
Super BlockFpm0Fpm14094 blocks...Fpm4095 blocks...Fpm4096 blocks...

The last column is the part where this all falls apart, because if it does only store 4095 actual blocks per FPM, then it SHOULD be 4 intervals (the last one having 1 block).

The other thing I do not understand is how the Fpm1 actually works. I.e., if we're using Fpm0, does that mean that what would be Fpm1 is actually the first writable block? This would mean that the file layout could change significantly if you change Fpms; a block could overflow and require a new in-stride FPM to support the one extra block.

The way I see it, there are two options that would get a 12289-block file into 3 intervals:

  1. When using Fpm0, Fpm1 is actually a writable block, and now we get 4095 blocks in the first stride, meaning the last one also only has 4095.
  2. No matter what, the first in-stride FPM is always 4096 after Fpm1 instead of the in-use Fpm (usually Fpm0). In this case, the first in-stride FPM would actually be at block 4098, and that would get us another usable block in the first stride. (This is my favorite theory, as it results in the most layout stability)

I'll dig through the MS code a bit more and see if it becomes clear. It sure sounds like you have a better understanding of the MS code than I do, so maybe you can put the puzzle pieces together faster :)

As for repro, that's a tricky question, because I'm working with proprietary code, so I don't think I can just send you my .obj files. And because it requires such a specific number of blocks to be present, recreating it could be tricky. That said, if you have any ideas I'm more than happy to try them out locally. I can repro it 10/10.

zturner added a comment.EditedJan 4 2018, 1:25 PM

No worries on the repro, I actually got one myself, and I now actually understand what's causing the problem.

Honestly the repro is trivial. All you need to do is edit MappedBlockStreamTest.cpp line 512 to look like this:

constexpr uint32_t NumFileBlocks = 4096 + 1;

And the createFpmStream unit test will crash. No other number works except +1, so -1 doesn't work. But because of this I now think I know *why* it is crashing. At every stride there are two FPMs. So basically FPM0 starts at block 1 and has a new piece every 4096 blocks. FPM1 starts at block 2 and has a new piece every 4096 blocks. So this means FPM0 has a piece at 4097 and FPM1 has a piece at 4098. I think this is what's causing the problem. We create an MSF file that only has 4097 blocks *total*, and then we try to "initialize" the FPM page at block 4098.

I think the fix is probably as simple as

if (TotalFileBlocks - 1 % BlockSize == 0)
  ++TotalFileBlocks;

I'm playing around with this locally in the meantime.

colden added a comment.Jan 4 2018, 1:30 PM

OOOOH That makes so much sense. I was wondering how you could have 2 FPMs at the start but only one every stride after that, but never connected that to this...

I think that's probably close, but wouldn't that only allocate an extra FPM at the end of the stream? Shouldn't we increment the block count by 2 whenever adding a new in-stride FPM? Presumably it's already adding 1 somewhere.

colden added a comment.Jan 4 2018, 1:32 PM

Just kidding it already does that?

MSFBuilder.cpp:113-125

// If we crossed over an fpm page, we actually need to allocate 2 extra
// blocks for each FPM group crossed and mark both blocks from the group as
// used.  We may not actually use them since there are many more FPM blocks
// present than are required to represent all blocks in a given PDB, but we
// need to make sure they aren't allocated to a stream or something else.
// At the end when committing the PDB, we'll go through and mark the
// extraneous ones unused.
while (NextFpmBlock < NewBlockCount) {
  NewBlockCount += 2;
  FreeBlocks.resize(NewBlockCount, true);
  FreeBlocks.reset(NextFpmBlock, NextFpmBlock + 2);
  NextFpmBlock += BlockSize;
}

Just to confirm, does your call stack look something like:

WritableMappedBlockStream::createFpmStream  (PDBFileBuilder.cpp:151)
PDBFileBuilder::commitFpm
PDBFileBuilder::commit
PDBLinker::commit

? I'm still thinking about where the right place to add this check is. I think we shouldn't allow an invalid number of blocks to even make it into the MSFLayout struct. So I'm thinking that it's probably best to add lots of asserts in the various functions that examine MSFLayout instances, and then have one place that only creates "correct" MSFLayout instances. At the same time though, it doesn't make a lot of sense to have the last block in the entire file be an FPM block. So maybe we can actually *delete* it rather than add another one.

On second thought, maybe we should just support this rare case. We can guarantee that LLD won't generate a PDB with odd numbers of blocks like this, but I don't we can make the same assumption about MSVC's linker. So perhaps it's better to just handle it gracefully and not access invalid memory.

colden added a comment.Jan 4 2018, 1:55 PM

Yeah, that's the callstack I'm seeing. And I'm all for as many asserts as we can add to guarantee safety.

Theoretically it should only add FPM blocks to the end when we've spilled over from the previous FPM's range, right? So unless we remove blocks from the end, there should never just be an FPM at the end.

Maybe I'm misunderstanding, but are new blocks only ever added through getFpmStreamLayout or MSFBuilder::allocateBlocks? If that's the case, theoretically as long as those handle adding 2 FPMs correctly on overflow, we should be in good shape. Or do you think we got an FPM at the end from MSVC?

If it's an MSVC thing, we should be able to fix it by just lopping the last FPMs off at load time. If they're the last blocks in a file, they can't possibly reference anything, and there's not much point in keeping them in memory.

Yeah, that's the callstack I'm seeing. And I'm all for as many asserts as we can add to guarantee safety.

Theoretically it should only add FPM blocks to the end when we've spilled over from the previous FPM's range, right? So unless we remove blocks from the end, there should never just be an FPM at the end.

Maybe I'm misunderstanding, but are new blocks only ever added through getFpmStreamLayout or MSFBuilder::allocateBlocks? If that's the case, theoretically as long as those handle adding 2 FPMs correctly on overflow, we should be in good shape. Or do you think we got an FPM at the end from MSVC?

If it's an MSVC thing, we should be able to fix it by just lopping the last FPMs off at load time. If they're the last blocks in a file, they can't possibly reference anything, and there's not much point in keeping them in memory.

It looks like in the case that causes the crash, which I reduced down to just using the unit test with NumBlocks = 4097, that means we will have blocks [0, 4096]. The two subsequent FPM blocks would have indices 4097 and 4098, and neither of those is allocated in the file.

So upon further thought I don't think the problem is that we're allocating one FPM block but not the other, but rather that we're not allocating *either* of them. The layout is basically:

// The FPM is going to look like this:
// |  0  |   1   |   2   |  ...  ┊  4096  |  4097  |  4098  | ... | 
// | SB  |  FPM0 | FPM1  | Data  ┊  Data  |  FPM0  |  FPM1  | ... |

Here I'm using a ┊ character to indicate the boundary for the FPM block interval period.

So the error happens because we're computing divideCeil(NumBlocks, BlockSize). This will report that we need an FPM interval for the second period, even though 4096 is the highest block number. I think the fix is a more accurate computation here, very similar to what you started with actually :)

I have a patch, I'll upload it separately and Cc you.

colden abandoned this revision.Jan 4 2018, 3:29 PM

Abandoned in favor of D41742.