This is an archive of the discontinued LLVM Phabricator instance.

Support skewed Stream Arrays
ClosedPublic

Authored by zturner on Dec 5 2018, 3:45 PM.

Details

Summary

VarStreamArray was built on the assumption that it is backed by a StreamRef, and offset 0 of that StreamRef is the first byte of the first record in the array.

This is a logical and intuitive assumption, but unfortunately we have use cases where it doesn't hold. Specifically, a PDB module's symbol stream is prefixed by 4 bytes containing a magic value, and the first byte of record data in the array is actually at offset 4 of this byte sequence.

Previously, we would just truncate the first 4 bytes and then construct the VarStreamArray with the resulting StreamRef, so that offset 0 of the underlying stream did correspond to the first byte of the first record, but this is problematic, because symbol records reference other symbol records by the absolute offset including that initial magic 4 bytes. So if another record wants to refer to the first record in the array, it would say "the record at offset 4".

This led to extremely confusing hacks and semantics in loading code, and after spending 30 minutes trying to get some math right and failing, I decided to fix this in the underlying implementation of VarStreamArray. Now, we can say that a stream is skewed by a particular amount. This way, when we access a record by absolute offset, we can use the same values that the records themselves contain, instead of having to do fixups.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Dec 5 2018, 3:45 PM
aganea accepted this revision.Dec 6 2018, 6:10 AM

LGTM, thank you for cleaning that up!

llvm/lib/DebugInfo/PDB/Native/ModuleDebugStream.cpp
60 ↗(On Diff #176888)

Could you please indicate in a comment what sizeof(uint32_t) refers to? Eg.

if (auto EC = SymbolReader.readArray(
        SymbolArray, SymbolReader.bytesRemaining(), /*skip Signature*/ sizeof(uint32_t)))
This revision is now accepted and ready to land.Dec 6 2018, 6:10 AM
aganea added inline comments.Dec 6 2018, 6:12 AM
llvm/lib/DebugInfo/PDB/Native/ModuleDebugStream.cpp
50 ↗(On Diff #176888)

Maybe a one-line explanation would be needed here too? To explain why we're reading the Signature, then reseting, the reading the whole stream?

This revision was automatically updated to reflect the committed changes.

If there is a magic value, should we be checking it?

Does the skew need to be taken into account in any existing boundary checks?

lldb/trunk/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
199

So is the above comment obsolete now?

llvm/trunk/lib/DebugInfo/PDB/Native/ModuleDebugStream.cpp
60

Perhaps a comment should go here to explain that the sizeof(uint32_t) accounts for CodeView's magic value at the beginning of the stream.