This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Fix alignment / padding when writing symbol records
ClosedPublic

Authored by zturner on Jun 1 2017, 9:54 AM.

Details

Summary

The rabbit hole runs deep on this one. Apparently when writing object files, symbol records are 1-byte aligned, but when writing PDB files, symbols are 4-byte aligned. I confirmed this through the following sequence of steps:

  1. Starting with test/tools/llvm-readobj/codeview-types.test, make it possible to build the code by deleting the #error and adding definitions of the various functions and adding a main function so that it can link into a valid executable.
  2. Use the instructions in the aforementioned file to regenerate the object file on a modern version of MSVC.
  3. Use llvm-readobj -codeview -codeview-subsection-bytes codeview-types.obj to dump the records. Save this output somewhere
  4. Link the object file using link /debug /out:codeview-types.exe codeview-types.obj, which should produce codeview-types.pdb
  5. Dump the pdb using llvm-pdbdump raw -dbi-module-syms -sym-record-bytes codeview-types.pdb and save the output somewhere.

When you look at the first file, you can find these two records (among plenty of others):

ObjectName {
  Signature: 0x0
  ObjectName: D:\src\llvmbuild\ninja\codeview-types-cl.obj
}
SymData (
  0000: 00000000 443A5C73 72635C6C 6C766D62  |....D:\src\llvmb|
  0010: 75696C64 5C6E696E 6A615C63 6F646576  |uild\ninja\codev|
  0020: 6965772D 74797065 732D636C 2E6F626A  |iew-types-cl.obj|
  0030: 00                                   |.|
)
FrameProc {
  TotalFrameBytes: 0x28
  PaddingFrameBytes: 0x0
  OffsetToPadding: 0x0
  BytesOfCalleeSavedRegisters: 0x0
  OffsetOfExceptionHandler: 0x0
  SectionIdOfExceptionHandler: 0x0
  Flags [ (0x128200)
    AsynchronousExceptionHandling (0x200)
    OptimizedForSpeed (0x100000)
  ]
}
SymData (
  0000: 28000000 00000000 00000000 00000000  |(...............|
  0010: 00000000 00000082 1200               |..........|
)

Note that the first record consists of 49 bytes and the second consists of 26 bytes, neither of which is a multiple of 4. If you track down the same records in the dumped PDB file, they look like this:

{
  ObjectName {
    Signature: 0x0
    ObjectName: D:\src\llvmbuild\ninja\codeview-types-cl.obj
  }
  Bytes (
    0000: 00000000 443A5C73 72635C6C 6C766D62  |....D:\src\llvmb|
    0010: 75696C64 5C6E696E 6A615C63 6F646576  |uild\ninja\codev|
    0020: 6965772D 74797065 732D636C 2E6F626A  |iew-types-cl.obj|
    0030: 00000000                             |....|
  )
}
{
  FrameProc {
    TotalFrameBytes: 0x28
    PaddingFrameBytes: 0x0
    OffsetToPadding: 0x0
    BytesOfCalleeSavedRegisters: 0x0
    OffsetOfExceptionHandler: 0x0
    SectionIdOfExceptionHandler: 0x0
    Flags [ (0x128200)
      AsynchronousExceptionHandling (0x200)
      OptimizedForSpeed (0x100000)
    ]
  }
  Bytes (
    0000: 28000000 00000000 00000000 00000000  |(...............|
    0010: 00000000 00000082 12000000           |............|
  )

and here we can see that both records have been zero-padded to 4 bytes but are otherwise identical.

Since we use the same code for reading / writing symbols to / from object files / PDB files, this is kind of annoying. To address this, I added an alignment field to the low level SymbolSerializer and SymbolDeserializer classes and propagate this value all the way up to the top-level client, in this case llvm-pdbdump, llvm-readobj, lld, etc and let them specify the alignment based on whether they're handling PDB files or object files.

After this change, we can round-trip symbol records from PDB -> YAML -> PDB, and I modified the existing round-tripping test to verify that this does in fact work now.. There are still about 5 bytes that differ between the original stream's symbol records and the new stream's symbol records, but at least it doesn't crash and the records look the same when re dumped. We can investigate the remaining byte differences in a followup patch.

Another interesting (and annoying) thing I learned is that either Microsoft compiles with #pragma pack(2) or #pragma pack(1) or the symbol record definitions in cvrec.h aren't authoritative. I went off on a wild goose chase for a couple of hours after noticing that FRAMEPROCSYM contains a uint16 followed by a uint32, and so there are 2 bytes of padding in there that we aren't accounting for. I confirmed this with cl /d1reportSingleClassRecordLayoutFRAMEPROCSYM against their structure definition. But this broke many of the existing tests in llvm-readobj. It turns out this 32-bit enum is *actually* written unaligned in the official implementation, so the code was originally correct. Also confirmed by running cvdump -s on an object file, and realizing that the padding to 4 bytes comes *after* the enum, not before.

Ugh. I want my night back.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Jun 1 2017, 9:54 AM
rnk accepted this revision.Jun 1 2017, 10:08 AM

BTW, cvinfo.h does in fact have #pragma pack ( push, 1 ) in it.

Looks good. :)

llvm/lib/DebugInfo/CodeView/CodeViewRecordIO.cpp
30 ↗(On Diff #101040)

It sounds like we can't have either of these assertions, and I'd rather not have commented out code and this dead if. Can we simplify this to say that some writers (such as MASM) over-allocate for pointer records, and we over-allocate when writing?

This revision is now accepted and ready to land.Jun 1 2017, 10:08 AM
inglorion edited edge metadata.Jun 1 2017, 11:30 AM

Thanks for fixing this, @zturner! I bumped into this a few days ago, but never really came up with a fix I was confident was the right thing.

Besides a few inline comments, there are also a few things I'd like you to add while you're at it. Specifically, in ModuleDebugFileChecksumFragment::commit() and ModuleDebugFragmentRecordBuilder::commit(), we use BinaryStreamWriter::padToAlignment() to pad the records we're writing. However, this actually pads the stream to a multiple of the alignment instead of the record, and so will only do the right thing if the stream was aligned before we wrote the record. I think we expect this to always be true. Can you add asserts to the beginning of the commit() methods to verify that the stream is 4-byte aligned at the beginning? Having those asserts there would have saved me some time while I was debugging the problem you're fixing here.

llvm/lib/DebugInfo/CodeView/CodeViewRecordIO.cpp
30 ↗(On Diff #101040)

I agree with rnk that we should probably not have a dead if here. I do like the explanation. I'm ok with a simpler explanation of why we can't verify that the size matches exactly, but I'm also ok with the explanation as it stands. But if we're unable to actually assert anything, please do remove the dead code.

llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp
71 ↗(On Diff #101040)

Can you change this to

assert(Symbol.length % 4 == 0 && "Symbol records must be a multiple of 4 bytes when writing PDBs")

or some such so that we get the explanation right in the error message? Then you can remove the comment.

llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
540 ↗(On Diff #101040)

I'm ok with this as-is, but since I'm asking for some changes I'd like to take the opportunity to suggest some alternatives to consider:

One would be to replace the 1 and 4 with named constants (say ObjectFileSymbolAlignment and PDBSymbolAlignment). That way, the code expresses the intent, you won't need to add the comment, and readers won't have to wonder what 1 or 4 means.

The other alternative is to do something like toCOFFCodeViewSymbol() and toPDBCodeViewSymbol(). I think that would also require replacing a few constructor calls (such as CVSymbolDumper in COFFDumper.cpp) with static method invocations, so may be more effort than it is worth.

zturner updated this revision to Diff 101093.Jun 1 2017, 1:16 PM

Updated with suggestions from inglorion. Note that I actually fixed a bug with this update of the patch. We weren't respecting the PDB / Object File alignment difference when writing the debug subsections, only when writing symbol records. We still had the padding hardcoded to 4 in the debug subsections. This definitely isn't true though, as I currently have this from a dump of an MSVC generated object file.

CodeViewDebugInfo [
  Section: .debug$S (2)
  Magic: 0x4
  Subsection [
    SubSectionType: Symbols (0xF1)
    SubSectionSize: 0x71
    SubSectionContents (
      0000: 33000111 00000000 443A5C73 72635C6C  |3.......D:\src\l|
      0010: 6C766D62 75696C64 5C6E696E 6A615C63  |lvmbuild\ninja\c|
      0020: 6F646576 6965772D 74797065 732D636C  |odeview-types-cl|
      0030: 2E6F626A 003A003C 11012000 00070013  |.obj.:.<.. .....|
      0040: 000A00BB 61000013 000A00BB 6100004D  |....a.......a..M|
      0050: 6963726F 736F6674 20285229 204F7074  |icrosoft (R) Opt|
      0060: 696D697A 696E6720 436F6D70 696C6572  |imizing Compiler|
      0070: 00                                   |.|
    )
...

So in this patch I also update the `DebugSubsectionRecordBuilder` to use the appropriate
alignment when computing its size and when writing itself.
rnk added a comment.Jun 1 2017, 1:20 PM

Updated with suggestions from inglorion. Note that I actually fixed a bug with this update of the patch. We weren't respecting the PDB / Object File alignment difference when writing the debug subsections, only when writing symbol records. We still had the padding hardcoded to 4 in the debug subsections. This definitely isn't true though, as I currently have this from a dump of an MSVC generated object file.

Do you think there is any harm to writing aligned symbol records into object files? If we were writing aligned records into object files before, I'd kind of like to maintain that property.

I don't think it should matter, because each record describes its own length, and that is the authority on how many bytes we need to skip to get to the next record.

On the other hand, I also don't think this patch really affects that. We do all our writing of object files through CodeViewDebug.cpp and that doesn't use these classes (at least for the time being, it would be nice to merge these in the future). So whatever object files we produced before, we will continue to produce.

So with that said, why bother even supporting both alignment styles? It's nice to have this documented in code somehow, and it gives us some flexibility in the future if we ever decide that we must do it the way Microsoft does it that we can support whatever method we need with the flip of a switch.

rnk added a comment.Jun 1 2017, 1:57 PM

I guess I'm saying it seems nice to always write aligned records until we come up with some reason to not write aligned records. Having the flexibility to do either seems good. We definitely have to read the unaligned records.

In D33785#770614, @rnk wrote:

I guess I'm saying it seems nice to always write aligned records until we come up with some reason to not write aligned records. Having the flexibility to do either seems good. We definitely have to read the unaligned records.

That's an interesting topic for another patch. As Zach says, this doesn't affect writing the objects.

And even if it did, I'd be mildly concerned about aligning output more strictly than the tools we're trying to be compatible with. If all our linker tests (for example) were based on neatly aligned object files, it would be easy for a bug to slip into the linker that assumes objects will always be neatly aligned.

I guess I'm saying it seems nice to always write aligned records until we come up with some reason to not write aligned records.

@zturner and I had talked about that a little bit before. I mentioned that unaligned records may save some space. They may be slightly less efficient to process, but that could be a good trade-off in object files. I'm not sure how large these records usually are or what the effect on size or speed would be if we always aligned or always packed or how much leeway we have in simplifying the code while being interoperable with other tools. For now, I prefer to focus on getting usable PDBs out of lld; we can perhaps revisit the alignment issue after that if it seems warranted.

inglorion accepted this revision.Jun 1 2017, 2:38 PM

lgtm modulo inline comment about messsage in the assert

llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp
71 ↗(On Diff #101040)

Please still add a message to the assert here.

This revision was automatically updated to reflect the committed changes.