This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/test/COFF/
-
test/
-
COFF/
2/2
pdb-tpi-aligned-records.test
-
llvm/
-
include/llvm/DebugInfo/CodeView/
-
llvm/
-
DebugInfo/
-
CodeView/
4/5
GlobalTypeTableBuilder.h
-
lib/DebugInfo/
-
DebugInfo/
-
CodeView/
-
MergingTypeTableBuilder.cpp
-
TypeStreamMerger.cpp
-
PDB/Native/
-
Native/
-
TpiStreamBuilder.cpp

Differential D75081

[CodeView] Align type records on 4-bytes when emitting PDBs
ClosedPublic

Authored by aganea on Feb 24 2020, 2:35 PM.

Download Raw Diff

Details

Reviewers

rnk
zturner
jhenderson
amccarth
ruiu

Commits

rGe3f3a43807d2: [CodeView] Align type records on 4-bytes when emitting PDBs
rG6196695ec581: [CodeView] Align type records on 4-bytes when emitting PDBs
rGa7325298e1f3: [CodeView] Align type records on 4-bytes when emitting PDBs

Summary

When emitting PDBs, the TypeStreamMerger class is used to merge .debug$T records from the input .OBJ files into the output .PDB stream.
Records in .OBJs are not required to be aligned on 4-bytes, and "The Netwide Assembler 2.14" generates non-aligned records.

When compiling with -DLLVM_ENABLE_ASSERTIONS=ON, an assert was triggered in MergingTypeTableBuilder when non-ghash merging was used.
With ghash merging there was no assert.
As a result, LLD could potentially generate a non-aligned TPI stream.

We now align records on 4-bytes when record indices are remapped, in TypeStreamMerger::remapIndices().

Diff Detail

Event Timeline

aganea created this revision.Feb 24 2020, 2:35 PM

Herald added a reviewer: jhenderson. · View Herald TranscriptFeb 24 2020, 2:35 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, rupprecht, hiraditya. · View Herald Transcript

aganea marked an inline comment as done.Feb 24 2020, 2:41 PM

aganea added inline comments.

llvm/include/llvm/DebugInfo/CodeView/GlobalTypeTableBuilder.h
74	I just duplicated these two lines from MergingTypeTableBuilder, but I think the test is wrong, it should say `RecordSize < MaxRecordLength` (which is 0xFF00). Changing it breaks the `long-name.ll` test, I could send a patch later.

Updated with better test.
I've moved the test in LLD because I can't find a codepath in LLVM that uses the TypeStreamMerger with OBJs as inputs and outputting into a PDB. The closest is llvm-pdbutil merge ... but that only takes TypeServer PDBs as inputs.

Adding more people:
+@amccarth
+@ruiu for the LLD test.
+@hans : This occurs in release/10.x but it's probably there for a long time.

lgtm

Thanks! I guess we never encountered non-aligned type records.

lld/test/COFF/pdb-tpi-aligned-records.test
22	Are you sure we don't ignore SectionData? If you confirm the test fails if you remove the LF_PAD code before committing, that's good enough for me. I guess to be doubly sure we could remove the `Types:` block below. I would expect yaml2obj to fall back to SectionData.

This revision is now accepted and ready to land.Mar 10 2020, 4:30 PM

Thanks for reviewing this Reid!

lld/test/COFF/pdb-tpi-aligned-records.test
22	If I comment out `SectionData:`, the test fails because the TPI record now starts with `0E000810` (the source Type stream is generated by LLVM) instead of `12000810` (the source Type stream is generated by NASM). This is because NASM generates wrongly-sized LF_PROCEDURE records, I've filled a bug here: https://bugzilla.nasm.us/show_bug.cgi?id=3392651 The `SectionData:` stream above was copy-pasted from a NASM-generated OBJ file. If I remove the changes to the code, the test fails, and the records after become unaligned: I only left `Types:` for information purposes, but I can comment it out, to make it obvious that the `SectionData:` is used.

I find the assert messages and some of the comments mildly misleading. In my mind, there's a difference between a record's size and a record's alignment. One way to achieve alignment is start at an aligned address and to make sure each record has a size that's a multiple of the alignment. That seems to be the approach here, and that's fine. But I'm concerned the wording choice could mislead someone who ends up trying to diagnose any problems in this vicinity.

Not knowing the details of how the records are actually read back and used, I harbor some concern that padding out the size of the record might confuse the consumer. If a record actually works out to 43 bytes, this code will copy the 43 bytes, add one pad byte, and change the record size to 44. Is it important that the consumer know the original size was 43 so that they don't mistake the pad byte for actual record data?

llvm/include/llvm/DebugInfo/CodeView/GlobalTypeTableBuilder.h
74	Or perhaps `RecordSize <= MaxRecordLength`. If it's one byte short of MaxRecordLength, then it should have been rounded up to the next multiple of 4 bytes, so MaxRecordLength itself is a legal size, right?
75	I know you just copied these lines, but the assert message is slightly misleading. Record size and alignment are related but different things, so it might be better to say something like "RecordSize is not a multiple of 4 bytes which will cause misalignment!"

In D75081#1917209, @amccarth wrote:

I find the assert messages and some of the comments mildly misleading.

Fixed, please check the updated diff. Let me know if that's better.

Not knowing the details of how the records are actually read back and used, I harbor some concern that padding out the size of the record might confuse the consumer. If a record actually works out to 43 bytes, this code will copy the 43 bytes, add one pad byte, and change the record size to 44. Is it important that the consumer know the original size was 43 so that they don't mistake the pad byte for actual record data?

The consumer only uses the llvm::codeview::RecordPrefix::RecordKind, which maps to the LF_* types (see TypeRecord.h). The record length llvm::codeview::RecordPrefix::RecordLen -- which we're modifying in this patch -- is only used for quickly skimming through a PDBs in the Visual Studio debugger: https://github.com/microsoft/microsoft-pdb/blob/082c5290e5aff028ae84e43affa8be717aa7af73/PDB/include/symtypeutils.h#L190. I suppose one reason for this 4 byte alignment is that unaligned reads were more expensive at the time the spec/code were written.

llvm/include/llvm/DebugInfo/CodeView/GlobalTypeTableBuilder.h
74	Microsoft does an exclusive check, perhaps we should do the same: https://github.com/microsoft/microsoft-pdb/blob/082c5290e5aff028ae84e43affa8be717aa7af73/PDB/dbi/tpi.cpp#L1130
75	Fixed.

Thanks!

Thanks for the feedback Adrian!

Closed by commit rGa7325298e1f3: [CodeView] Align type records on 4-bytes when emitting PDBs (authored by aganea). · Explain WhyMar 13 2020, 9:40 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lld/

test/

COFF/

pdb-tpi-aligned-records.test

43 lines

llvm/

include/

llvm/

DebugInfo/

CodeView/

GlobalTypeTableBuilder.h

5 lines

lib/

DebugInfo/

CodeView/

MergingTypeTableBuilder.cpp

4 lines

TypeStreamMerger.cpp

24 lines

PDB/

Native/

TpiStreamBuilder.cpp

10 lines

Diff 249720

lld/test/COFF/pdb-tpi-aligned-records.test

This file was added.

				# RUN: yaml2obj < %s > %t.obj
				# RUN: yaml2obj %p/Inputs/generic.yaml > %t2.obj

				# RUN: lld-link /out:%t.exe /debug /entry:main %t.obj %t2.obj /nodefaultlib
				# RUN: llvm-pdbutil dump --types --type-data %t.pdb \| FileCheck %s
				# CHECK: 0000: 12000810 03000000 00000000 00000000 0000F2F1

				--- !COFF
				header:
				Machine: IMAGE_FILE_MACHINE_AMD64
				Characteristics: []
				sections:
				- Name: '.debug$T'
				Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ]
				Alignment: 1
				# It is important to keep the 'SectionData' since the .OBJ is reconstructed from it,
				# and that triggers an alignement bug in the output .PDB.
				SectionData: '040000001000081003000000000000000000000000000600011200000000'
				Types:
				- Kind: LF_PROCEDURE
				Procedure:
				ReturnType: 3
				rnkUnsubmitted Done Reply Inline Actions Are you sure we don't ignore SectionData? If you confirm the test fails if you remove the LF_PAD code before committing, that's good enough for me. I guess to be doubly sure we could remove the `Types:` block below. I would expect yaml2obj to fall back to SectionData. rnk: Are you sure we don't ignore SectionData? If you confirm the test fails if you remove the…
				aganeaAuthorUnsubmitted Done Reply Inline Actions If I comment out `SectionData:`, the test fails because the TPI record now starts with `0E000810` (the source Type stream is generated by LLVM) instead of `12000810` (the source Type stream is generated by NASM). This is because NASM generates wrongly-sized LF_PROCEDURE records, I've filled a bug here: https://bugzilla.nasm.us/show_bug.cgi?id=3392651 The `SectionData:` stream above was copy-pasted from a NASM-generated OBJ file. If I remove the changes to the code, the test fails, and the records after become unaligned: I only left `Types:` for information purposes, but I can comment it out, to make it obvious that the `SectionData:` is used. aganea: If I comment out `SectionData:`, the test fails because the TPI record now starts with…
				CallConv: NearC
				Options: [ None ]
				ParameterCount: 0
				ArgumentList: 0
				- Kind: LF_ARGLIST
				ArgList:
				ArgIndices: [ ]
				symbols:
				- Name: '.debug$T'
				Value: 0
				SectionNumber: 1
				SimpleType: IMAGE_SYM_TYPE_NULL
				ComplexType: IMAGE_SYM_DTYPE_NULL
				StorageClass: IMAGE_SYM_CLASS_STATIC
				SectionDefinition:
				Length: 30
				NumberOfRelocations: 0
				NumberOfLinenumbers: 0
				CheckSum: 0
				Number: 0
				...

llvm/include/llvm/DebugInfo/CodeView/GlobalTypeTableBuilder.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:
BumpPtrAllocator &getAllocator() { return RecordStorage; }		BumpPtrAllocator &getAllocator() { return RecordStorage; }

ArrayRef<ArrayRef<uint8_t>> records() const;		ArrayRef<ArrayRef<uint8_t>> records() const;
ArrayRef<GloballyHashedType> hashes() const;		ArrayRef<GloballyHashedType> hashes() const;

template <typename CreateFunc>		template <typename CreateFunc>
TypeIndex insertRecordAs(GloballyHashedType Hash, size_t RecordSize,		TypeIndex insertRecordAs(GloballyHashedType Hash, size_t RecordSize,
CreateFunc Create) {		CreateFunc Create) {
		assert(RecordSize < UINT32_MAX && "Record too big");
		aganeaAuthorUnsubmitted Done Reply Inline Actions I just duplicated these two lines from MergingTypeTableBuilder, but I think the test is wrong, it should say `RecordSize < MaxRecordLength` (which is 0xFF00). Changing it breaks the `long-name.ll` test, I could send a patch later. aganea: I just duplicated these two lines from MergingTypeTableBuilder, but I think the test is wrong…
		amccarthUnsubmitted Not Done Reply Inline Actions Or perhaps `RecordSize <= MaxRecordLength`. If it's one byte short of MaxRecordLength, then it should have been rounded up to the next multiple of 4 bytes, so MaxRecordLength itself is a legal size, right? amccarth: Or perhaps `RecordSize <= MaxRecordLength`. If it's one byte short of MaxRecordLength, then it…
		aganeaAuthorUnsubmitted Done Reply Inline Actions Microsoft does an exclusive check, perhaps we should do the same: https://github.com/microsoft/microsoft-pdb/blob/082c5290e5aff028ae84e43affa8be717aa7af73/PDB/dbi/tpi.cpp#L1130 aganea: Microsoft does an exclusive check, perhaps we should do the same: https://github.
		assert(RecordSize % 4 == 0 &&
		amccarthUnsubmitted Done Reply Inline Actions I know you just copied these lines, but the assert message is slightly misleading. Record size and alignment are related but different things, so it might be better to say something like "RecordSize is not a multiple of 4 bytes which will cause misalignment!" amccarth: I know you just copied these lines, but the assert message is slightly misleading. Record size…
		aganeaAuthorUnsubmitted Done Reply Inline Actions Fixed. aganea: Fixed.
		"RecordSize is not a multiple of 4 bytes which will cause "
		"misalignment in the output TPI stream!");

auto Result = HashedRecords.try_emplace(Hash, nextTypeIndex());		auto Result = HashedRecords.try_emplace(Hash, nextTypeIndex());

if (LLVM_UNLIKELY(Result.second /inserted/ \|\|		if (LLVM_UNLIKELY(Result.second /inserted/ \|\|
Result.first->second.isSimple())) {		Result.first->second.isSimple())) {
uint8_t *Stable = RecordStorage.Allocate<uint8_t>(RecordSize);		uint8_t *Stable = RecordStorage.Allocate<uint8_t>(RecordSize);
MutableArrayRef<uint8_t> Data(Stable, RecordSize);		MutableArrayRef<uint8_t> Data(Stable, RecordSize);
ArrayRef<uint8_t> StableRecord = Create(Data);		ArrayRef<uint8_t> StableRecord = Create(Data);
if (StableRecord.empty()) {		if (StableRecord.empty()) {
Show All 34 Lines

llvm/lib/DebugInfo/CodeView/MergingTypeTableBuilder.cpp

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	static inline ArrayRef<uint8_t> stabilize(BumpPtrAllocator &Alloc,
uint8_t *Stable = Alloc.Allocate<uint8_t>(Data.size());		uint8_t *Stable = Alloc.Allocate<uint8_t>(Data.size());
memcpy(Stable, Data.data(), Data.size());		memcpy(Stable, Data.data(), Data.size());
return makeArrayRef(Stable, Data.size());		return makeArrayRef(Stable, Data.size());
}		}

TypeIndex MergingTypeTableBuilder::insertRecordAs(hash_code Hash,		TypeIndex MergingTypeTableBuilder::insertRecordAs(hash_code Hash,
ArrayRef<uint8_t> &Record) {		ArrayRef<uint8_t> &Record) {
assert(Record.size() < UINT32_MAX && "Record too big");		assert(Record.size() < UINT32_MAX && "Record too big");
assert(Record.size() % 4 == 0 && "Record is not aligned to 4 bytes!");		assert(Record.size() % 4 == 0 &&
		"The type record size is not a multiple of 4 bytes which will cause "
		"misalignment in the output TPI stream!");

LocallyHashedType WeakHash{Hash, Record};		LocallyHashedType WeakHash{Hash, Record};
auto Result = HashedRecords.try_emplace(WeakHash, nextTypeIndex());		auto Result = HashedRecords.try_emplace(WeakHash, nextTypeIndex());

if (Result.second) {		if (Result.second) {
ArrayRef<uint8_t> RecordData = stabilize(RecordStorage, Record);		ArrayRef<uint8_t> RecordData = stabilize(RecordStorage, Record);
Result.first->first.RecordData = RecordData;		Result.first->first.RecordData = RecordData;
SeenRecords.push_back(RecordData);		SeenRecords.push_back(RecordData);
Show All 22 Lines

llvm/lib/DebugInfo/CodeView/TypeStreamMerger.cpp

Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	if (!R)
return R.takeError();		return R.takeError();

TypeIndex DestIdx = Untranslated;		TypeIndex DestIdx = Untranslated;
if (*R) {		if (*R) {
auto DoSerialize =		auto DoSerialize =
[this, Type](MutableArrayRef<uint8_t> Storage) -> ArrayRef<uint8_t> {		[this, Type](MutableArrayRef<uint8_t> Storage) -> ArrayRef<uint8_t> {
return remapIndices(Type, Storage);		return remapIndices(Type, Storage);
};		};
		unsigned AlignedSize = alignTo(Type.RecordData.size(), 4);

if (LLVM_LIKELY(UseGlobalHashes)) {		if (LLVM_LIKELY(UseGlobalHashes)) {
GlobalTypeTableBuilder &Dest =		GlobalTypeTableBuilder &Dest =
isIdRecord(Type.kind()) ? DestGlobalIdStream : DestGlobalTypeStream;		isIdRecord(Type.kind()) ? DestGlobalIdStream : DestGlobalTypeStream;
GloballyHashedType H = GlobalHashes[CurIndex.toArrayIndex()];		GloballyHashedType H = GlobalHashes[CurIndex.toArrayIndex()];
DestIdx = Dest.insertRecordAs(H, Type.RecordData.size(), DoSerialize);		DestIdx = Dest.insertRecordAs(H, AlignedSize, DoSerialize);
} else {		} else {
MergingTypeTableBuilder &Dest =		MergingTypeTableBuilder &Dest =
isIdRecord(Type.kind()) ? DestIdStream : DestTypeStream;		isIdRecord(Type.kind()) ? DestIdStream : DestTypeStream;

RemapStorage.resize(Type.RecordData.size());		RemapStorage.resize(AlignedSize);
ArrayRef<uint8_t> Result = DoSerialize(RemapStorage);		ArrayRef<uint8_t> Result = DoSerialize(RemapStorage);
if (!Result.empty())		if (!Result.empty())
DestIdx = Dest.insertRecordBytes(Result);		DestIdx = Dest.insertRecordBytes(Result);
}		}
}		}
addMapping(DestIdx);		addMapping(DestIdx);

++CurIndex;		++CurIndex;
assert((IsSecondPass \|\| IndexMap.size() == slotForIndex(CurIndex)) &&		assert((IsSecondPass \|\| IndexMap.size() == slotForIndex(CurIndex)) &&
"visitKnownRecord should add one index map entry");		"visitKnownRecord should add one index map entry");
return Error::success();		return Error::success();
}		}

ArrayRef<uint8_t>		ArrayRef<uint8_t>
TypeStreamMerger::remapIndices(const CVType &OriginalType,		TypeStreamMerger::remapIndices(const CVType &OriginalType,
MutableArrayRef<uint8_t> Storage) {		MutableArrayRef<uint8_t> Storage) {
		unsigned Align = OriginalType.RecordData.size() & 3;
		unsigned AlignedSize = alignTo(OriginalType.RecordData.size(), 4);
		assert(Storage.size() == AlignedSize &&
		"The storage buffer size is not a multiple of 4 bytes which will "
		"cause misalignment in the output TPI stream!");

SmallVector<TiReference, 4> Refs;		SmallVector<TiReference, 4> Refs;
discoverTypeIndices(OriginalType.RecordData, Refs);		discoverTypeIndices(OriginalType.RecordData, Refs);
if (Refs.empty())		if (Refs.empty() && Align == 0)
return OriginalType.RecordData;		return OriginalType.RecordData;

::memcpy(Storage.data(), OriginalType.RecordData.data(),		::memcpy(Storage.data(), OriginalType.RecordData.data(),
OriginalType.RecordData.size());		OriginalType.RecordData.size());

uint8_t *DestContent = Storage.data() + sizeof(RecordPrefix);		uint8_t *DestContent = Storage.data() + sizeof(RecordPrefix);

for (auto &Ref : Refs) {		for (auto &Ref : Refs) {
TypeIndex *DestTIs =		TypeIndex *DestTIs =
reinterpret_cast<TypeIndex *>(DestContent + Ref.Offset);		reinterpret_cast<TypeIndex *>(DestContent + Ref.Offset);

for (size_t I = 0; I < Ref.Count; ++I) {		for (size_t I = 0; I < Ref.Count; ++I) {
TypeIndex &TI = DestTIs[I];		TypeIndex &TI = DestTIs[I];
bool Success = (Ref.Kind == TiRefKind::IndexRef) ? remapItemIndex(TI)		bool Success = (Ref.Kind == TiRefKind::IndexRef) ? remapItemIndex(TI)
: remapTypeIndex(TI);		: remapTypeIndex(TI);
if (LLVM_UNLIKELY(!Success))		if (LLVM_UNLIKELY(!Success))
return {};		return {};
}		}
}		}

		if (Align > 0) {
		RecordPrefix *StorageHeader =
		reinterpret_cast<RecordPrefix *>(Storage.data());
		StorageHeader->RecordLen += 4 - Align;

		DestContent = Storage.data() + OriginalType.RecordData.size();
		for (; Align < 4; ++Align)
		*DestContent++ = LF_PAD4 - Align;
		}
return Storage;		return Storage;
}		}

Error llvm::codeview::mergeTypeRecords(MergingTypeTableBuilder &Dest,		Error llvm::codeview::mergeTypeRecords(MergingTypeTableBuilder &Dest,
SmallVectorImpl<TypeIndex> &SourceToDest,		SmallVectorImpl<TypeIndex> &SourceToDest,
const CVTypeArray &Types) {		const CVTypeArray &Types) {
TypeStreamMerger M(SourceToDest);		TypeStreamMerger M(SourceToDest);
return M.mergeTypeRecords(Dest, Types);		return M.mergeTypeRecords(Dest, Types);
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/lib/DebugInfo/PDB/Native/TpiStreamBuilder.cpp

Show All 38 Lines

void TpiStreamBuilder::setVersionHeader(PdbRaw_TpiVer Version) {		void TpiStreamBuilder::setVersionHeader(PdbRaw_TpiVer Version) {
VerHeader = Version;		VerHeader = Version;
}		}

void TpiStreamBuilder::addTypeRecord(ArrayRef<uint8_t> Record,		void TpiStreamBuilder::addTypeRecord(ArrayRef<uint8_t> Record,
Optional<uint32_t> Hash) {		Optional<uint32_t> Hash) {
// If we just crossed an 8KB threshold, add a type index offset.		// If we just crossed an 8KB threshold, add a type index offset.
		assert(((Record.size() & 3) == 0) &&
		"The type record's size is not a multiple of 4 bytes which will "
		"cause misalignment in the output TPI stream!");
size_t NewSize = TypeRecordBytes + Record.size();		size_t NewSize = TypeRecordBytes + Record.size();
constexpr size_t EightKB = 8 * 1024;		constexpr size_t EightKB = 8 * 1024;
if (NewSize / EightKB > TypeRecordBytes / EightKB \|\| TypeRecords.empty()) {		if (NewSize / EightKB > TypeRecordBytes / EightKB \|\| TypeRecords.empty()) {
TypeIndexOffsets.push_back(		TypeIndexOffsets.push_back(
{codeview::TypeIndex(codeview::TypeIndex::FirstNonSimpleIndex +		{codeview::TypeIndex(codeview::TypeIndex::FirstNonSimpleIndex +
TypeRecords.size()),		TypeRecords.size()),
ulittle32_t(TypeRecordBytes)});		ulittle32_t(TypeRecordBytes)});
}		}
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	Error TpiStreamBuilder::commit(const msf::MSFLayout &Layout,
auto InfoS = WritableMappedBlockStream::createIndexedStream(Layout, Buffer,		auto InfoS = WritableMappedBlockStream::createIndexedStream(Layout, Buffer,
Idx, Allocator);		Idx, Allocator);

BinaryStreamWriter Writer(*InfoS);		BinaryStreamWriter Writer(*InfoS);
if (auto EC = Writer.writeObject(*Header))		if (auto EC = Writer.writeObject(*Header))
return EC;		return EC;

for (auto Rec : TypeRecords) {		for (auto Rec : TypeRecords) {
assert(!Rec.empty()); // An empty record will not write anything, but it		assert(!Rec.empty() && "Attempting to write an empty type record shifts "
// would shift all offsets from here on.		"all offsets in the TPI stream!");
		assert(((Rec.size() & 3) == 0) &&
		"The type record's size is not a multiple of 4 bytes which will "
		"cause misalignment in the output TPI stream!");
if (auto EC = Writer.writeBytes(Rec))		if (auto EC = Writer.writeBytes(Rec))
return EC;		return EC;
}		}

if (HashStreamIndex != kInvalidStreamIndex) {		if (HashStreamIndex != kInvalidStreamIndex) {
auto HVS = WritableMappedBlockStream::createIndexedStream(		auto HVS = WritableMappedBlockStream::createIndexedStream(
Layout, Buffer, HashStreamIndex, Allocator);		Layout, Buffer, HashStreamIndex, Allocator);
BinaryStreamWriter HW(*HVS);		BinaryStreamWriter HW(*HVS);
Show All 13 Lines