DbiStreamBuilder calculated the offset of the source file names inside
the file info substream as the size of the file info substream minus
the size of the file names. Since the file info substream is padded to
a multiple of 4 bytes, this caused the first file name to be aligned
on a 4-byte boundary. By contrast, DbiModuleList would read the file
names immediately after the file name offset table, without skipping
to the next 4-byte boundary. This change makes it so that the file
names are written to the location where DbiModuleList expects them,
and puts any necessary padding for the file info substream after the
file names instead of before it.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I think you should be able to write a test for this too using a similar technique as mentioned in the previous review. Just create a yaml file with a module that contains a couple of source files. In theory you can omit any field you don't care about.
I'm slightly confused by the summary. This patch changes where the names are written to. How do we know that right versus changing where the names are read from?
Hmm, you raise a good point. Bob, can you take a look at DBI1::reloadFileInfo in dbi.cpp in the reference implementation? I looked at it briefly and it does not appear to align the beginning of the names buffer, but it does align the size of the entire file info substream.
@amccarth, you raise a good point that I neglected to address before. I looked at changing the location we read from vs. changing the location we write to, and concluded that the location we read from is probably correct, because we haven't seen corrupted filenames in PDBs not generated by our tools, to my knowledge. Looking at the reference implementation, e.g. QueryFileInfo and QueryFileInfo2 have the alignment code after the code that deals with filenames. So it seems correct to write the filenames after the fileinfos, and then pad the subsection as necessary, rather than pad between the fileinfos and the filenames. The real proof will be once I actually get the filenames to be displayed by Microsoft's tools, but we're not quite there yet.
Thanks for checking! I'd like to see that conclusion in a comment somewhere appropriate and a test to make sure it doesn't regress.
Actually, I think there is a way to verify that the alignment is correct. If I generate a PDB from YAML using the old code (without this change) and dump it with llvm-pdbdump pretty -all, I get corrupted filenames. If I apply this change, generate the PDB from YAML again, and dump with llvm-pdbdump pretty -all, the filenames are correct. Since pretty uses the DIA library, I think that proves that the change I've implemented here expresses the correct behavior.
lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp | ||
---|---|---|
132 ↗ | (On Diff #100171) | Minor quibble with the name: This appears to calculate a size rather than an offset. But that's not really a big deal. |
lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp | ||
---|---|---|
132 ↗ | (On Diff #100171) | It actually calculates the offset of the file names from the beginning of the file info substream. I think your confusion comes from the accumulator variable being named "Size", which I copy-pastad from claculateFileInfoSubstreamSize. I'll change it to "Offset". |
test/DebugInfo/PDB/Inputs/source-names-2.yaml | ||
---|---|---|
1–8 ↗ | (On Diff #100174) | Instead of having two different yaml files that are largely similar, how about 1 file with multiple source files? --- DbiStream: Modules: - Module: 'C:\src\test.obj' ObjFile: 'C:\src\test.obj' SourceFiles: - 'C:\src\test.c' - 'C:\src\testx.c' - 'C:\src\testxx.c' - 'C:\src\testxxx.c' ... |
test/DebugInfo/PDB/pdbdump-align-source-names.test | ||
1–7 ↗ | (On Diff #100174) | It feels a bit like exposing too much of an implementation detail to say that we're testing padding of an internal field. Just call it a source file test (since we apparently didn't have one at all before) |
16–20 ↗ | (On Diff #100174) | With one input file, this can just be: CHECK: SourceFiles: CHECK-NEXT: 'C:\src\test.c` CHECK-NEXT: 'C:\src\testx.c` CHECK-NEXT: 'C:\src\testxx.c` CHECK-NEXT: 'C:\src\testxxx.c` |
test/DebugInfo/PDB/Inputs/source-names-2.yaml | ||
---|---|---|
1–8 ↗ | (On Diff #100174) | Wouldn't that run the risk of accidentally avoiding the problem? The bug this fixes is that we aligned the file names with the end of the file info substream (effectively putting 0-3 bytes of padding before the file names) instead of with the end of the previous field (effectively putting no padding before the file names and 0-3 bytes after). If we only have one test, that test could hit the case where the padding works out to 0 bytes and padding before cannot be distinguished from padding after. That's why I have two tests that only differ in the length of the file name. |
test/DebugInfo/PDB/Inputs/source-names-2.yaml | ||
---|---|---|
1–8 ↗ | (On Diff #100174) | Ok, yea makes sense. |
@zturner: I rewrote the comment to emphasize that we're testing reading and writing of source file names, with alignment more as a side note. Do you like it better this way?