Download Raw Diff

Details

Reviewers

amccarth
rnk
zturner

Commits

rG55256ada254f: [pdb] pad source file name buffer at the end instead of the beginning
rL303917: [pdb] pad source file name buffer at the end instead of the beginning

Summary

DbiStreamBuilder calculated the offset of the source file names inside
the file info substream as the size of the file info substream minus
the size of the file names. Since the file info substream is padded to
a multiple of 4 bytes, this caused the first file name to be aligned
on a 4-byte boundary. By contrast, DbiModuleList would read the file
names immediately after the file name offset table, without skipping
to the next 4-byte boundary. This change makes it so that the file
names are written to the location where DbiModuleList expects them,
and puts any necessary padding for the file info substream after the
file names instead of before it.

Diff Detail

Build Status

Buildable 6742
Build 6742: arc lint + arc unit

Event Timeline

inglorion created this revision.May 23 2017, 6:02 PM

Harbormaster completed remote builds in B6712: Diff 100026.May 23 2017, 6:02 PM

I think you should be able to write a test for this too using a similar technique as mentioned in the previous review. Just create a yaml file with a module that contains a couple of source files. In theory you can omit any field you don't care about.

I'm slightly confused by the summary. This patch changes where the names are written to. How do we know that right versus changing where the names are read from?

In D33475#763306, @amccarth wrote:

I'm slightly confused by the summary. This patch changes where the names are written to. How do we know that right versus changing where the names are read from?

Hmm, you raise a good point. Bob, can you take a look at DBI1::reloadFileInfo in dbi.cpp in the reference implementation? I looked at it briefly and it does not appear to align the beginning of the names buffer, but it does align the size of the entire file info substream.

@amccarth, you raise a good point that I neglected to address before. I looked at changing the location we read from vs. changing the location we write to, and concluded that the location we read from is probably correct, because we haven't seen corrupted filenames in PDBs not generated by our tools, to my knowledge. Looking at the reference implementation, e.g. QueryFileInfo and QueryFileInfo2 have the alignment code after the code that deals with filenames. So it seems correct to write the filenames after the fileinfos, and then pad the subsection as necessary, rather than pad between the fileinfos and the filenames. The real proof will be once I actually get the filenames to be displayed by Microsoft's tools, but we're not quite there yet.

Thanks for checking! I'd like to see that conclusion in a comment somewhere appropriate and a test to make sure it doesn't regress.

Actually, I think there is a way to verify that the alignment is correct. If I generate a PDB from YAML using the old code (without this change) and dump it with llvm-pdbdump pretty -all, I get corrupted filenames. If I apply this change, generate the PDB from YAML again, and dump with llvm-pdbdump pretty -all, the filenames are correct. Since pretty uses the DIA library, I think that proves that the change I've implemented here expresses the correct behavior.

added test

Harbormaster completed remote builds in B6740: Diff 100171.May 24 2017, 3:04 PM

amccarth accepted this revision.May 24 2017, 3:17 PM

amccarth added inline comments.

lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp
131–132	Minor quibble with the name: This appears to calculate a size rather than an offset. But that's not really a big deal.

This revision is now accepted and ready to land.May 24 2017, 3:17 PM

inglorion added inline comments.May 24 2017, 3:23 PM

lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp
131–132	It actually calculates the offset of the file names from the beginning of the file info substream. I think your confusion comes from the accumulator variable being named "Size", which I copy-pastad from claculateFileInfoSubstreamSize. I'll change it to "Offset".

renamed confusingly named variable and cut some redundant text from tests

removed trailing whitespace in yaml files

zturner added inline comments.May 24 2017, 4:06 PM

test/DebugInfo/PDB/Inputs/source-names-2.yaml
2–9	Instead of having two different yaml files that are largely similar, how about 1 file with multiple source files? --- DbiStream: Modules: - Module: 'C:\src\test.obj' ObjFile: 'C:\src\test.obj' SourceFiles: - 'C:\src\test.c' - 'C:\src\testx.c' - 'C:\src\testxx.c' - 'C:\src\testxxx.c' ...
test/DebugInfo/PDB/pdbdump-align-source-names.test
2–8	It feels a bit like exposing too much of an implementation detail to say that we're testing padding of an internal field. Just call it a source file test (since we apparently didn't have one at all before)
17–21	With one input file, this can just be: CHECK: SourceFiles: CHECK-NEXT: 'C:\src\test.c` CHECK-NEXT: 'C:\src\testx.c` CHECK-NEXT: 'C:\src\testxx.c` CHECK-NEXT: 'C:\src\testxxx.c`

inglorion added inline comments.May 24 2017, 4:37 PM

test/DebugInfo/PDB/Inputs/source-names-2.yaml
2–9	Wouldn't that run the risk of accidentally avoiding the problem? The bug this fixes is that we aligned the file names with the end of the file info substream (effectively putting 0-3 bytes of padding before the file names) instead of with the end of the previous field (effectively putting no padding before the file names and 0-3 bytes after). If we only have one test, that test could hit the case where the padding works out to 0 bytes and padding before cannot be distinguished from padding after. That's why I have two tests that only differ in the length of the file name.

zturner added inline comments.May 24 2017, 5:04 PM

test/DebugInfo/PDB/Inputs/source-names-2.yaml
2–9	Ok, yea makes sense.

@zturner: I rewrote the comment to emphasize that we're testing reading and writing of source file names, with alignment more as a side note. Do you like it better this way?

Harbormaster completed remote builds in B6770: Diff 100293.May 25 2017, 1:42 PM

Sorry, yea I meant to LGTM this yesterday.

No worries, thanks for your comments! Landing now.

Closed by commit rL303917: [pdb] pad source file name buffer at the end instead of the beginning (authored by inglorion). · Explain WhyMay 25 2017, 2:12 PM

This revision was automatically updated to reflect the committed changes.

Diff 100173

include/llvm/DebugInfo/PDB/Native/DbiStreamBuilder.h

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	private:			private:
	struct DebugStream {			struct DebugStream {
	ArrayRef<uint8_t> Data;			ArrayRef<uint8_t> Data;
	uint16_t StreamNumber = 0;			uint16_t StreamNumber = 0;
	};			};

	Error finalize();			Error finalize();
	uint32_t calculateModiSubstreamSize() const;			uint32_t calculateModiSubstreamSize() const;
				uint32_t calculateNamesOffset() const;
	uint32_t calculateSectionContribsStreamSize() const;			uint32_t calculateSectionContribsStreamSize() const;
	uint32_t calculateSectionMapStreamSize() const;			uint32_t calculateSectionMapStreamSize() const;
	uint32_t calculateFileInfoSubstreamSize() const;			uint32_t calculateFileInfoSubstreamSize() const;
	uint32_t calculateNamesBufferSize() const;			uint32_t calculateNamesBufferSize() const;
	uint32_t calculateDbgStreamsSize() const;			uint32_t calculateDbgStreamsSize() const;

	Error generateModiSubstream();			Error generateModiSubstream();
	Error generateFileInfoSubstream();			Error generateFileInfoSubstream();
	Show All 29 Lines

lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	return sizeof(enum PdbRaw_DbiSecContribVer) +
sizeof(SectionContribs[0]) * SectionContribs.size();		sizeof(SectionContribs[0]) * SectionContribs.size();
}		}

uint32_t DbiStreamBuilder::calculateSectionMapStreamSize() const {		uint32_t DbiStreamBuilder::calculateSectionMapStreamSize() const {
if (SectionMap.empty())		if (SectionMap.empty())
return 0;		return 0;
return sizeof(SecMapHeader) + sizeof(SecMapEntry) * SectionMap.size();		return sizeof(SecMapHeader) + sizeof(SecMapEntry) * SectionMap.size();
}		}

uint32_t DbiStreamBuilder::calculateFileInfoSubstreamSize() const {		uint32_t DbiStreamBuilder::calculateNamesOffset() const {
		amccarthUnsubmitted Not Done Reply Inline Actions Minor quibble with the name: This appears to calculate a size rather than an offset. But that's not really a big deal. amccarth: Minor quibble with the name: This appears to calculate a size rather than an offset. But…
		inglorionAuthorUnsubmitted Not Done Reply Inline Actions It actually calculates the offset of the file names from the beginning of the file info substream. I think your confusion comes from the accumulator variable being named "Size", which I copy-pastad from claculateFileInfoSubstreamSize. I'll change it to "Offset". inglorion: It actually calculates the offset of the file names from the beginning of the file info…
uint32_t Size = 0;		uint32_t Offset = 0;
Size += sizeof(ulittle16_t); // NumModules		Offset += sizeof(ulittle16_t); // NumModules
Size += sizeof(ulittle16_t); // NumSourceFiles		Offset += sizeof(ulittle16_t); // NumSourceFiles
Size += ModiList.size() * sizeof(ulittle16_t); // ModIndices		Offset += ModiList.size() * sizeof(ulittle16_t); // ModIndices
Size += ModiList.size() * sizeof(ulittle16_t); // ModFileCounts		Offset += ModiList.size() * sizeof(ulittle16_t); // ModFileCounts
uint32_t NumFileInfos = 0;		uint32_t NumFileInfos = 0;
for (const auto &M : ModiList)		for (const auto &M : ModiList)
NumFileInfos += M->source_files().size();		NumFileInfos += M->source_files().size();
Size += NumFileInfos * sizeof(ulittle32_t); // FileNameOffsets		Offset += NumFileInfos * sizeof(ulittle32_t); // FileNameOffsets
		return Offset;
		}

		uint32_t DbiStreamBuilder::calculateFileInfoSubstreamSize() const {
		uint32_t Size = calculateNamesOffset();
Size += calculateNamesBufferSize();		Size += calculateNamesBufferSize();
return alignTo(Size, sizeof(uint32_t));		return alignTo(Size, sizeof(uint32_t));
}		}

uint32_t DbiStreamBuilder::calculateNamesBufferSize() const {		uint32_t DbiStreamBuilder::calculateNamesBufferSize() const {
uint32_t Size = 0;		uint32_t Size = 0;
for (const auto &F : SourceFileNames) {		for (const auto &F : SourceFileNames) {
Size += F.getKeyLength() + 1; // Names[I];		Size += F.getKeyLength() + 1; // Names[I];
}		}
return Size;		return Size;
}		}

uint32_t DbiStreamBuilder::calculateDbgStreamsSize() const {		uint32_t DbiStreamBuilder::calculateDbgStreamsSize() const {
return DbgStreams.size() * sizeof(uint16_t);		return DbgStreams.size() * sizeof(uint16_t);
}		}

Error DbiStreamBuilder::generateFileInfoSubstream() {		Error DbiStreamBuilder::generateFileInfoSubstream() {
uint32_t Size = calculateFileInfoSubstreamSize();		uint32_t Size = calculateFileInfoSubstreamSize();
uint32_t NameSize = calculateNamesBufferSize();
auto Data = Allocator.Allocate<uint8_t>(Size);		auto Data = Allocator.Allocate<uint8_t>(Size);
uint32_t NamesOffset = Size - NameSize;		uint32_t NamesOffset = calculateNamesOffset();

FileInfoBuffer = MutableBinaryByteStream(MutableArrayRef<uint8_t>(Data, Size),		FileInfoBuffer = MutableBinaryByteStream(MutableArrayRef<uint8_t>(Data, Size),
llvm::support::little);		llvm::support::little);

WritableBinaryStreamRef MetadataBuffer =		WritableBinaryStreamRef MetadataBuffer =
WritableBinaryStreamRef(FileInfoBuffer).keep_front(NamesOffset);		WritableBinaryStreamRef(FileInfoBuffer).keep_front(NamesOffset);
BinaryStreamWriter MetadataWriter(MetadataBuffer);		BinaryStreamWriter MetadataWriter(MetadataBuffer);

Show All 31 Lines	for (StringRef Name : MI->source_files()) {
if (Result == SourceFileNames.end())		if (Result == SourceFileNames.end())
return make_error<RawError>(raw_error_code::no_entry,		return make_error<RawError>(raw_error_code::no_entry,
"The source file was not found.");		"The source file was not found.");
if (auto EC = MetadataWriter.writeInteger(Result->second))		if (auto EC = MetadataWriter.writeInteger(Result->second))
return EC;		return EC;
}		}
}		}

		if (auto EC = NameBufferWriter.padToAlignment(sizeof(uint32_t)))
		return EC;

if (NameBufferWriter.bytesRemaining() > 0)		if (NameBufferWriter.bytesRemaining() > 0)
return make_error<RawError>(raw_error_code::invalid_format,		return make_error<RawError>(raw_error_code::invalid_format,
"The names buffer contained unexpected data.");		"The names buffer contained unexpected data.");

if (MetadataWriter.bytesRemaining() > sizeof(uint32_t))		if (MetadataWriter.bytesRemaining() > sizeof(uint32_t))
return make_error<RawError>(		return make_error<RawError>(
raw_error_code::invalid_format,		raw_error_code::invalid_format,
"The metadata buffer contained unexpected data.");		"The metadata buffer contained unexpected data.");
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

test/DebugInfo/PDB/Inputs/source-names-1.yaml

This file was added.

				---
				DbiStream:
				Modules:
				- Module: 'C:\src\test.obj'
				ObjFile: 'C:\src\test.obj'
				SourceFiles:
				- 'C:\src\test.c'
				...

test/DebugInfo/PDB/Inputs/source-names-2.yaml

This file was added.

				---
				DbiStream:
				Modules:
				- Module: 'C:\src\test.obj'
				ObjFile: 'C:\src\test.obj'
				SourceFiles:
				- 'C:\src\test.cc'
				...

test/DebugInfo/PDB/pdbdump-align-source-names.test

This file was added.

				# Module source file names are contained in the file info substream. The
				# substream is padded to a multiple of 4 bytes, and this padding must come
				# after the file names. Putting it before the file names results in the
				# possibility of source file names being read as empty or truncated. This test
				# uses file names of two different lengths to make sure at least one hits the
				# case where padding is needed and verifies that file names are correct in
				# both cases.

				zturnerUnsubmitted Not Done Reply Inline Actions It feels a bit like exposing too much of an implementation detail to say that we're testing padding of an internal field. Just call it a source file test (since we apparently didn't have one at all before) zturner: It feels a bit like exposing too much of an implementation detail to say that we're testing…
				RUN: llvm-pdbdump yaml2pdb -pdb=%T/source-names-1.pdb %p/Inputs/source-names-1.yaml
				RUN: llvm-pdbdump pdb2yaml -dbi-module-source-info %T/source-names-1.pdb \
				RUN: \| FileCheck -check-prefix=CHECK1 %s
				RUN: llvm-pdbdump yaml2pdb -pdb=%T/source-names-2.pdb %p/Inputs/source-names-2.yaml
				RUN: llvm-pdbdump pdb2yaml -dbi-module-source-info %T/source-names-2.pdb \
				RUN: \| FileCheck -check-prefix=CHECK2 %s

				CHECK1: SourceFiles:
				CHECK1: 'C:\src\test.c'

				CHECK2: SourceFiles:
				CHECK2: 'C:\src\test.cc'

This is an archive of the discontinued LLVM Phabricator instance.

[pdb] pad source file name buffer at the end instead of the beginning
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100173

include/llvm/DebugInfo/PDB/Native/DbiStreamBuilder.h

lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp

test/DebugInfo/PDB/Inputs/source-names-1.yaml

test/DebugInfo/PDB/Inputs/source-names-2.yaml

test/DebugInfo/PDB/pdbdump-align-source-names.test

This is an archive of the discontinued LLVM Phabricator instance.

[pdb] pad source file name buffer at the end instead of the beginningClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100173

include/llvm/DebugInfo/PDB/Native/DbiStreamBuilder.h

lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp

test/DebugInfo/PDB/Inputs/source-names-1.yaml

test/DebugInfo/PDB/Inputs/source-names-2.yaml

test/DebugInfo/PDB/pdbdump-align-source-names.test

[pdb] pad source file name buffer at the end instead of the beginning
ClosedPublic