This is an archive of the discontinued LLVM Phabricator instance.

[pdb] pad source file name buffer at the end instead of the beginning
ClosedPublic

Authored by inglorion on May 23 2017, 6:02 PM.

Details

Summary

DbiStreamBuilder calculated the offset of the source file names inside
the file info substream as the size of the file info substream minus
the size of the file names. Since the file info substream is padded to
a multiple of 4 bytes, this caused the first file name to be aligned
on a 4-byte boundary. By contrast, DbiModuleList would read the file
names immediately after the file name offset table, without skipping
to the next 4-byte boundary. This change makes it so that the file
names are written to the location where DbiModuleList expects them,
and puts any necessary padding for the file info substream after the
file names instead of before it.

Diff Detail

Repository
rL LLVM

Event Timeline

inglorion created this revision.May 23 2017, 6:02 PM
zturner edited edge metadata.May 23 2017, 7:48 PM

I think you should be able to write a test for this too using a similar technique as mentioned in the previous review. Just create a yaml file with a module that contains a couple of source files. In theory you can omit any field you don't care about.

amccarth edited edge metadata.May 24 2017, 8:15 AM

I'm slightly confused by the summary. This patch changes where the names are written to. How do we know that right versus changing where the names are read from?

I'm slightly confused by the summary. This patch changes where the names are written to. How do we know that right versus changing where the names are read from?

Hmm, you raise a good point. Bob, can you take a look at DBI1::reloadFileInfo in dbi.cpp in the reference implementation? I looked at it briefly and it does not appear to align the beginning of the names buffer, but it does align the size of the entire file info substream.

@amccarth, you raise a good point that I neglected to address before. I looked at changing the location we read from vs. changing the location we write to, and concluded that the location we read from is probably correct, because we haven't seen corrupted filenames in PDBs not generated by our tools, to my knowledge. Looking at the reference implementation, e.g. QueryFileInfo and QueryFileInfo2 have the alignment code after the code that deals with filenames. So it seems correct to write the filenames after the fileinfos, and then pad the subsection as necessary, rather than pad between the fileinfos and the filenames. The real proof will be once I actually get the filenames to be displayed by Microsoft's tools, but we're not quite there yet.

Thanks for checking! I'd like to see that conclusion in a comment somewhere appropriate and a test to make sure it doesn't regress.

Actually, I think there is a way to verify that the alignment is correct. If I generate a PDB from YAML using the old code (without this change) and dump it with llvm-pdbdump pretty -all, I get corrupted filenames. If I apply this change, generate the PDB from YAML again, and dump with llvm-pdbdump pretty -all, the filenames are correct. Since pretty uses the DIA library, I think that proves that the change I've implemented here expresses the correct behavior.

inglorion updated this revision to Diff 100171.May 24 2017, 3:04 PM

added test

amccarth accepted this revision.May 24 2017, 3:17 PM
amccarth added inline comments.
lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp
132 ↗(On Diff #100171)

Minor quibble with the name: This appears to calculate a size rather than an offset. But that's not really a big deal.

This revision is now accepted and ready to land.May 24 2017, 3:17 PM
inglorion added inline comments.May 24 2017, 3:23 PM
lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp
132 ↗(On Diff #100171)

It actually calculates the offset of the file names from the beginning of the file info substream. I think your confusion comes from the accumulator variable being named "Size", which I copy-pastad from claculateFileInfoSubstreamSize. I'll change it to "Offset".

inglorion updated this revision to Diff 100173.May 24 2017, 3:27 PM

renamed confusingly named variable and cut some redundant text from tests

inglorion updated this revision to Diff 100174.May 24 2017, 3:28 PM

removed trailing whitespace in yaml files

zturner added inline comments.May 24 2017, 4:06 PM
test/DebugInfo/PDB/Inputs/source-names-2.yaml
1–8 ↗(On Diff #100174)

Instead of having two different yaml files that are largely similar, how about 1 file with multiple source files?

---
DbiStream:
  Modules:
    - Module:          'C:\src\test.obj'
      ObjFile:         'C:\src\test.obj'
      SourceFiles:
        - 'C:\src\test.c'
        - 'C:\src\testx.c'
        - 'C:\src\testxx.c'
        - 'C:\src\testxxx.c'
...
test/DebugInfo/PDB/pdbdump-align-source-names.test
1–7 ↗(On Diff #100174)

It feels a bit like exposing too much of an implementation detail to say that we're testing padding of an internal field. Just call it a source file test (since we apparently didn't have one at all before)

16–20 ↗(On Diff #100174)

With one input file, this can just be:

CHECK: SourceFiles:
CHECK-NEXT: 'C:\src\test.c`
CHECK-NEXT: 'C:\src\testx.c`
CHECK-NEXT: 'C:\src\testxx.c`
CHECK-NEXT: 'C:\src\testxxx.c`
inglorion added inline comments.May 24 2017, 4:37 PM
test/DebugInfo/PDB/Inputs/source-names-2.yaml
1–8 ↗(On Diff #100174)

Wouldn't that run the risk of accidentally avoiding the problem? The bug this fixes is that we aligned the file names with the end of the file info substream (effectively putting 0-3 bytes of padding before the file names) instead of with the end of the previous field (effectively putting no padding before the file names and 0-3 bytes after). If we only have one test, that test could hit the case where the padding works out to 0 bytes and padding before cannot be distinguished from padding after. That's why I have two tests that only differ in the length of the file name.

zturner added inline comments.May 24 2017, 5:04 PM
test/DebugInfo/PDB/Inputs/source-names-2.yaml
1–8 ↗(On Diff #100174)

Ok, yea makes sense.

inglorion updated this revision to Diff 100293.May 25 2017, 1:42 PM

@zturner: I rewrote the comment to emphasize that we're testing reading and writing of source file names, with alignment more as a side note. Do you like it better this way?

zturner accepted this revision.May 25 2017, 1:52 PM

Sorry, yea I meant to LGTM this yesterday.

No worries, thanks for your comments! Landing now.

This revision was automatically updated to reflect the committed changes.