This is an archive of the discontinued LLVM Phabricator instance.

[MSF] Fix FPM interval calculation
ClosedPublic

Authored by zturner on Jan 4 2018, 2:55 PM.

Details

Summary

The FPM (free page map) in an MSF file is an odd structure. It is a series of blocks that are spread throughout the file at periodic intervals. And there are actually two separate FPMs organized in the file this way.

The first one, FPM0, begins at block 1 and a new piece occurs every BlockSize blocks.
The second one, FPM1, begins at block 2 and a new piece also occurs every BlockSize blocks.

Graphically, this can be represented like this:

Block Index |  0  |   1   |   2   |  ...  ┊  4096  |  4097  |  4098  | ... | 
    Meaning | SB  |  FPM0 | FPM1  | Data  ┊  Data  |  FPM0  |  FPM1  | ... |

where ┊ indicates a a period-boundary.

We had an off by 1 error where if there were exactly 4097 blocks (or, for that matter, 4096 * n + 1 blocks for any n), one piece of code would decide "oh that ends before FPM0 from this period, so we don't have an FPM0 to worry about" whereas another piece of code would say "there are two FPM intervals here" and then when iterating over those 2 intervals, would access uninitialized memory since the other piece of code only returned data from 1.

The fix here is to properly handle these edge cases. If block count is of the form 4096 * n + 1, we properly return n fpm intervals instead of n+1.

There's another odd edge case that wasn't being triggered, but was found by inspection when looking into this issue. We shouldn't ever create a PDB file with 4098 blocks, because that would look like this:

Block Index |  0  |   1   |   2   |  ...  ┊  4096  |  4097  |
    Meaning | SB  |  FPM0 | FPM1  | Data  ┊  Data  |  FPM0  |

and here we've got a continuation of FPM0 but not of FPM1. Although we can promise never to generate one of these, we probably should have the code handle it gracefully anyway and not crash (since we can't control what we see).

Initial investigative work and patch was done by Colden Cullen in D41734 (there's also some interesting discussion over there if anyone wants to follow)

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Jan 4 2018, 2:55 PM
zturner edited the summary of this revision. (Show Details)Jan 4 2018, 2:56 PM
colden added a comment.Jan 4 2018, 3:33 PM

Makes sense! Thanks for adding a test btw.

I tested this with my use-case, and it does indeed fix the problem.

llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
118 ↗(On Diff #128656)

Naming nitpick: In other contexts, AltFpm implies the opposite of Msf.SB->FreeBlockMapBlock (i.e. 3 - FpmBlock). How about instead of taking that param here, you take an int for FpmIndex (or something), require it be 1 or 2, and then in the MSFLayout overload you can do the AltFpm transformation?

119 ↗(On Diff #128656)

Ah, so I was almost right! Just didn't need the +1 on BlockSize.

colden added inline comments.Jan 4 2018, 3:47 PM
llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
118 ↗(On Diff #128656)

Follow up question: shouldn't it not matter which FPM you're using?

zturner added inline comments.Jan 4 2018, 3:58 PM
llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
118 ↗(On Diff #128656)

It only matters insofar as we want to handle that last edge case mentioned in the description of the diff. It won't matter for your case, but if somehow there were a PDB with *exactly* 4098 blocks (which again, LLD would never generate), then there would be 2 FPM0 intervals and 1 FPM1 interval.

colden added inline comments.Jan 4 2018, 4:00 PM
llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
118 ↗(On Diff #128656)

Oh I see what you mean. It's too bad we can't trim those on load, but maybe that's another patch for another day :)

Meh, I'm getting a crash when I run the full test suite with this patch. Will have to investigate.

zturner updated this revision to Diff 128677.Jan 4 2018, 4:28 PM

False alarm, no crash. This update makes the suggested naming change, and also adds a detailed comment to the getNumFpmIntervals function.

colden added inline comments.Jan 4 2018, 4:39 PM
llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
143 ↗(On Diff #128677)

Sorry maybe I wasn't super clear before, but if you look at MSFCommon.cpp, getFpmStreamLayout, the AltFpm parameter there doesn't actually specify which Fpm block to use, but whether or not to use the Fpm block NOT referenced by the layout. To match that functionality, this should look more like:

int Fpm = L.SB->FreeBlockMapBlock;
return getNumFpmIntervals(L.SB->BlockSize, L.SB->NumBlocks,
                            IncludeUnusedFpmData, AltFpm ? 3U - Fpm  : Fpm);

Otherwise you could end up in situations where your active FPM is 2, but you want to initialize 1, so you pass true for AltFpm. That just gets you 2 anyway. This is how it's used in PDBFileBuilder::commitFpm, when it initializes the second Fpm but never uses it.

zturner added inline comments.Jan 4 2018, 4:45 PM
llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
143 ↗(On Diff #128677)

Ahh yea, I see your point.

zturner updated this revision to Diff 128683.Jan 4 2018, 4:57 PM

Updated with suggestions

colden accepted this revision.Jan 4 2018, 4:58 PM

I'm assuming you'll want approval from someone else too, but this looks good to me, and is proven to fix my bug.

llvm/include/llvm/DebugInfo/MSF/MSFCommon.h
154 ↗(On Diff #128683)

Oh that's even cleaner.

This revision is now accepted and ready to land.Jan 4 2018, 4:58 PM
colden added a comment.Jan 5 2018, 9:28 AM

Oh hey also, should this go in 6.0?

This revision was automatically updated to reflect the committed changes.