This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/DebugInfo/
-
llvm/
-
DebugInfo/
-
CodeView/
-
CVRecord.h
-
PDB/Raw/
-
Raw/
-
Hash.h
-
lib/DebugInfo/PDB/Raw/
-
DebugInfo/
-
PDB/
-
Raw/
-
Hash.cpp
1
TpiStream.cpp

Differential D21393

[pdbdump] Verify LF_{CLASS,ENUM,INTERFACE,STRUCTURE,UNION} records with different hash function.
ClosedPublic

Authored by ruiu on Jun 15 2016, 11:42 AM.

Download Raw Diff

Details

Reviewers

zturner

Commits

rG74c4341dde8a: [codeview] Use hashBufferV8 to verify all type records.
rL272930: [codeview] Use hashBufferV8 to verify all type records.

Summary

This patch adds a new hash function hashBufferV8 and use it to verify TPI records.

Diff Detail

Event Timeline

ruiu updated this revision to Diff 60870.Jun 15 2016, 11:42 AM

ruiu retitled this revision from to [pdbdump] Verify LF_{CLASS,ENUM,INTERFACE,STRUCTURE,UNION} records with different hash function..

ruiu updated this object.

ruiu added a reviewer: zturner.

ruiu added a subscriber: llvm-commits.

I think we should have some tests confirming that these functions produce the same values that are in MS-generated PDBs for the same record contents.

Do we already have unit tests for PDB?

Ah, we have tests under unittests/DebugInfo/PDB. I'll add a new file there.

Actually, empty.pdb has type records that uses this hash function, so if it's broken the test would break. You can confirm that by returning some random value from hashBufferV8. Do we need more tests?

Well but empty.pdb is not generated by us. Since now we're generating the hash, rather than just reading the hash that someone else generated I think we should write a test that verifies that a) we're using the correct hash function, and b) we're hashing the right buffers. So I was thinking of a test that would work like the following:

Read type records, and the corresponding hashes out of empty.pdb
Use our functions to hash those records and generate hash values.
verify that the hash we generate is the same one that was found in the PDB file.

Similar to how you ran that test locally a week or so ago to check whether the hashes match so we know we're using it correctly, but in a test.

I think that's exactly what we are doing -- in this code, we read type records, compute hash values using our hash function, and compare them with hash values on the PDB file. If they are different, pdbdump reports that the file is corrupted.

Ahh, I think I get it. I'm a little concerned about the performance impact
of this though. The whole reason the hash is stored in the file is so that
it doesn't have to recompute the hash up front. If we're re-computing it
every time we load the type streams, it seems like it will be a big
performance problem. I think we should only compute the hash when we
modify a record and/or write the file.

Yeah, I don't think this code will live here forever. I'm adding code here to verify that my understanding on how the hash values are computed is correct. Once everything is set, I'll probably remove it from here and move this to a writer.

In D21393#459087, @ruiu wrote:

Yeah, I don't think this code will live here forever. I'm adding code here to verify that my understanding on how the hash values are computed is correct. Once everything is set, I'll probably remove it from here and move this to a writer.

Ok, for the time being can you add a command line option --verify-tpi-hash-integrity and only do this check if it is set? Sometimes I dump really large files, and I'm afraid this would double the amount of time it takes which is already >= 1 minute in some cases.

Rebased

zturner added inline comments.Jun 16 2016, 11:12 AM

lib/DebugInfo/PDB/Raw/TpiStream.cpp

I was imagining that we wouldn't even have this TpiHashVerifier class, and that everything would just go in TypeDumper. Does this not work? For example, imagine you delete all of this code, and delete the TpiHashVerifier class, and delete lines 205-209. Then, in TypeDumper.cpp, you change CVTypeDumper::visitClass to look like this:

Error CVTypeDumper::visitClass(ClassRecord &Class) {
  uint16_t Props = static_cast<uint16_t>(Class.getOptions());
  W->printNumber("MemberCount", Class.getMemberCount());
  W->printFlags("Properties", Props, makeArrayRef(ClassOptionNames));
  printTypeIndex("FieldList", Class.getFieldList());
  printTypeIndex("DerivedFrom", Class.getDerivationList());
  printTypeIndex("VShape", Class.getVTableShape());
  W->printNumber("SizeOf", Class.getSize());
  W->printString("Name", Class.getName());
  if (Props & uint16_t(ClassOptions::HasUniqueName))
    W->printString("LinkageName", Class.getUniqueName());
  Name = Class.getName();

  /* NEW CODE HERE */
  if (auto EC = verifyHash(Class))
    return EC;
  return Error::success();
}

With the current implementation, we walk the entire type array twice. Once to verify hashes, and once to dump them. So if we could do it all in one pass, it seems better.

It may work (though I didn't try), but I think the current structure is better since we are going to remove this code from here and add code to actually generate hash values to the writer. In the current structure, the code to generate hash values are separated and concentrated. It's better than scatter it into multiple methods.

zturner accepted this revision.Jun 16 2016, 11:38 AM

zturner edited edge metadata.

This revision is now accepted and ready to land.Jun 16 2016, 11:38 AM

Closed by commit rL272930: [codeview] Use hashBufferV8 to verify all type records. (authored by ruiu). · Explain WhyJun 16 2016, 11:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

DebugInfo/

CodeView/

CVRecord.h

11 lines

PDB/

Raw/

Hash.h

2 lines

lib/

DebugInfo/

PDB/

Raw/

Hash.cpp

55 lines

TpiStream.cpp

26 lines

Diff 60870

include/llvm/DebugInfo/CodeView/CVRecord.h

	Show All 17 Lines
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"

	namespace llvm {			namespace llvm {
	namespace codeview {			namespace codeview {

	template <typename Kind> struct CVRecord {			template <typename Kind> struct CVRecord {
	uint32_t Length;			uint32_t Length;
	Kind Type;			Kind Type;
	ArrayRef<uint8_t> Data;			ArrayRef<uint8_t> Data; // Data without RecordPrefix
				ArrayRef<uint8_t> RawData; // Data with RecordPrefix
	};			};

	template <typename Kind> struct VarStreamArrayExtractor<CVRecord<Kind>> {			template <typename Kind> struct VarStreamArrayExtractor<CVRecord<Kind>> {
	Error operator()(StreamRef Stream, uint32_t &Len,			Error operator()(StreamRef Stream, uint32_t &Len,
	CVRecord<Kind> &Item) const {			CVRecord<Kind> &Item) const {
	const RecordPrefix *Prefix = nullptr;			const RecordPrefix *Prefix = nullptr;
	StreamReader Reader(Stream);			StreamReader Reader(Stream);
				uint32_t Offset = Reader.getOffset();

	if (auto EC = Reader.readObject(Prefix))			if (auto EC = Reader.readObject(Prefix))
	return EC;			return EC;
	Item.Length = Prefix->RecordLen;			Item.Length = Prefix->RecordLen;
	if (Item.Length < 2)			if (Item.Length < 2)
	return make_error<CodeViewError>(cv_error_code::corrupt_record);			return make_error<CodeViewError>(cv_error_code::corrupt_record);
	Item.Type = static_cast<Kind>(uint16_t(Prefix->RecordKind));			Item.Type = static_cast<Kind>(uint16_t(Prefix->RecordKind));
	if (auto EC = Reader.readBytes(Item.Data, Item.Length - 2))
				Reader.setOffset(Offset);
				if (auto EC =
				Reader.readBytes(Item.RawData, Item.Length + sizeof(uint16_t)))
	return EC;			return EC;
				Item.Data = Item.RawData.slice(sizeof(RecordPrefix));
	Len = Prefix->RecordLen + 2;			Len = Prefix->RecordLen + 2;
	return Error::success();			return Error::success();
	}			}
	};			};
	}			}
	}			}

	#endif			#endif

include/llvm/DebugInfo/PDB/Raw/Hash.h

	//===- Hash.h - PDB hash functions ------------------------------- C++ --===//			//===- Hash.h - PDB hash functions ------------------------------- C++ --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_DEBUGINFO_PDB_RAW_HASH_H			#ifndef LLVM_DEBUGINFO_PDB_RAW_HASH_H
	#define LLVM_DEBUGINFO_PDB_RAW_HASH_H			#define LLVM_DEBUGINFO_PDB_RAW_HASH_H

				#include "llvm/ADT/ArrayRef.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include <stdint.h>			#include <stdint.h>

	namespace llvm {			namespace llvm {
	namespace pdb {			namespace pdb {
	uint32_t hashStringV1(StringRef Str);			uint32_t hashStringV1(StringRef Str);
	uint32_t hashStringV2(StringRef Str);			uint32_t hashStringV2(StringRef Str);
				uint32_t hashBufferV8(ArrayRef<uint8_t> Data);
	}			}
	}			}

	#endif			#endif

lib/DebugInfo/PDB/Raw/Hash.cpp

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	uint32_t pdb::hashStringV2(StringRef Str) {
for (uint8_t Item : Buffer) {		for (uint8_t Item : Buffer) {
Hash += Item;		Hash += Item;
Hash += (Hash << 10);		Hash += (Hash << 10);
Hash ^= (Hash >> 6);		Hash ^= (Hash >> 6);
}		}

return Hash * 1664525L + 1013904223L;		return Hash * 1664525L + 1013904223L;
}		}

		// This is a table to produce the same hash values as Microsoft tools do.
		static const uint32_t V8HashTable[] = {
		0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419, 0x706AF48F,
		0xE963A535, 0x9E6495A3, 0x0EDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988,
		0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91, 0x1DB71064, 0x6AB020F2,
		0xF3B97148, 0x84BE41DE, 0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7,
		0x136C9856, 0x646BA8C0, 0xFD62F97A, 0x8A65C9EC, 0x14015C4F, 0x63066CD9,
		0xFA0F3D63, 0x8D080DF5, 0x3B6E20C8, 0x4C69105E, 0xD56041E4, 0xA2677172,
		0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B, 0x35B5A8FA, 0x42B2986C,
		0xDBBBC9D6, 0xACBCF940, 0x32D86CE3, 0x45DF5C75, 0xDCD60DCF, 0xABD13D59,
		0x26D930AC, 0x51DE003A, 0xC8D75180, 0xBFD06116, 0x21B4F4B5, 0x56B3C423,
		0xCFBA9599, 0xB8BDA50F, 0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924,
		0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D, 0x76DC4190, 0x01DB7106,
		0x98D220BC, 0xEFD5102A, 0x71B18589, 0x06B6B51F, 0x9FBFE4A5, 0xE8B8D433,
		0x7807C9A2, 0x0F00F934, 0x9609A88E, 0xE10E9818, 0x7F6A0DBB, 0x086D3D2D,
		0x91646C97, 0xE6635C01, 0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E,
		0x6C0695ED, 0x1B01A57B, 0x8208F4C1, 0xF50FC457, 0x65B0D9C6, 0x12B7E950,
		0x8BBEB8EA, 0xFCB9887C, 0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3, 0xFBD44C65,
		0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2, 0x4ADFA541, 0x3DD895D7,
		0xA4D1C46D, 0xD3D6F4FB, 0x4369E96A, 0x346ED9FC, 0xAD678846, 0xDA60B8D0,
		0x44042D73, 0x33031DE5, 0xAA0A4C5F, 0xDD0D7CC9, 0x5005713C, 0x270241AA,
		0xBE0B1010, 0xC90C2086, 0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F,
		0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4, 0x59B33D17, 0x2EB40D81,
		0xB7BD5C3B, 0xC0BA6CAD, 0xEDB88320, 0x9ABFB3B6, 0x03B6E20C, 0x74B1D29A,
		0xEAD54739, 0x9DD277AF, 0x04DB2615, 0x73DC1683, 0xE3630B12, 0x94643B84,
		0x0D6D6A3E, 0x7A6A5AA8, 0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1,
		0xF00F9344, 0x8708A3D2, 0x1E01F268, 0x6906C2FE, 0xF762575D, 0x806567CB,
		0x196C3671, 0x6E6B06E7, 0xFED41B76, 0x89D32BE0, 0x10DA7A5A, 0x67DD4ACC,
		0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5, 0xD6D6A3E8, 0xA1D1937E,
		0x38D8C2C4, 0x4FDFF252, 0xD1BB67F1, 0xA6BC5767, 0x3FB506DD, 0x48B2364B,
		0xD80D2BDA, 0xAF0A1B4C, 0x36034AF6, 0x41047A60, 0xDF60EFC3, 0xA867DF55,
		0x316E8EEF, 0x4669BE79, 0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236,
		0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F, 0xC5BA3BBE, 0xB2BD0B28,
		0x2BB45A92, 0x5CB36A04, 0xC2D7FFA7, 0xB5D0CF31, 0x2CD99E8B, 0x5BDEAE1D,
		0x9B64C2B0, 0xEC63F226, 0x756AA39C, 0x026D930A, 0x9C0906A9, 0xEB0E363F,
		0x72076785, 0x05005713, 0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38,
		0x92D28E9B, 0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21, 0x86D3D2D4, 0xF1D4E242,
		0x68DDB3F8, 0x1FDA836E, 0x81BE16CD, 0xF6B9265B, 0x6FB077E1, 0x18B74777,
		0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C, 0x8F659EFF, 0xF862AE69,
		0x616BFFD3, 0x166CCF45, 0xA00AE278, 0xD70DD2EE, 0x4E048354, 0x3903B3C2,
		0xA7672661, 0xD06016F7, 0x4969474D, 0x3E6E77DB, 0xAED16A4A, 0xD9D65ADC,
		0x40DF0B66, 0x37D83BF0, 0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9,
		0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6, 0xBAD03605, 0xCDD70693,
		0x54DE5729, 0x23D967BF, 0xB3667A2E, 0xC4614AB8, 0x5D681B02, 0x2A6F2B94,
		0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D,
		};

		// Corresponds to `SigForPbCb` in langapi/shared/crc32.h.
		uint32_t pdb::hashBufferV8(ArrayRef<uint8_t> Buf) {
		uint32_t Hash = 0;
		for (uint8_t Byte : Buf)
		Hash = (Hash >> 8) ^ V8HashTable[(Hash & 0xff) ^ Byte];
		return Hash;
		}

lib/DebugInfo/PDB/Raw/TpiStream.cpp

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines

	TpiStream::TpiStream(const PDBFile &File,			TpiStream::TpiStream(const PDBFile &File,
	std::unique_ptr<MappedBlockStream> Stream)			std::unique_ptr<MappedBlockStream> Stream)
	: Pdb(File), Stream(std::move(Stream)), HashFunction(nullptr) {}			: Pdb(File), Stream(std::move(Stream)), HashFunction(nullptr) {}

	TpiStream::~TpiStream() {}			TpiStream::~TpiStream() {}

	// Computes a hash for a given TPI record.			// Computes a hash for a given TPI record.
	template <typename T, codeview::TypeRecordKind K>			template <typename T, codeview::TypeRecordKind K>
				zturnerUnsubmitted Not Done Reply Inline Actions I was imagining that we wouldn't even have this `TpiHashVerifier` class, and that everything would just go in `TypeDumper`. Does this not work? For example, imagine you delete all of this code, and delete the `TpiHashVerifier` class, and delete lines 205-209. Then, in `TypeDumper.cpp`, you change `CVTypeDumper::visitClass` to look like this: Error CVTypeDumper::visitClass(ClassRecord &Class) { uint16_t Props = static_cast<uint16_t>(Class.getOptions()); W->printNumber("MemberCount", Class.getMemberCount()); W->printFlags("Properties", Props, makeArrayRef(ClassOptionNames)); printTypeIndex("FieldList", Class.getFieldList()); printTypeIndex("DerivedFrom", Class.getDerivationList()); printTypeIndex("VShape", Class.getVTableShape()); W->printNumber("SizeOf", Class.getSize()); W->printString("Name", Class.getName()); if (Props & uint16_t(ClassOptions::HasUniqueName)) W->printString("LinkageName", Class.getUniqueName()); Name = Class.getName(); /* NEW CODE HERE / if (auto EC = verifyHash(Class)) return EC; return Error::success(); } With the current implementation, we walk the entire type array twice. Once to verify hashes, and once to dump them. So if we could do it all in one pass, it seems better. zturner:* I was imagining that we wouldn't even have this `TpiHashVerifier` class, and that everything…
	static Error getTpiHash(const codeview::CVType &Rec, uint32_t &Hash) {			static Error getTpiHash(const codeview::CVType &Rec, uint32_t &Hash) {
				using namespace codeview;

	ArrayRef<uint8_t> Data = Rec.Data;			ArrayRef<uint8_t> Data = Rec.Data;
	ErrorOr<T> Obj = T::deserialize(K, Data);			ErrorOr<T> Obj = T::deserialize(K, Data);
	if (Obj.getError())			if (Obj.getError())
	return llvm::make_error<codeview::CodeViewError>(			return llvm::make_error<codeview::CodeViewError>(
	codeview::cv_error_code::corrupt_record);			codeview::cv_error_code::corrupt_record);

	auto Opts = static_cast<uint16_t>(Obj->getOptions());			auto Opts = static_cast<uint16_t>(Obj->getOptions());
	if (Opts & static_cast<uint16_t>(codeview::ClassOptions::ForwardReference)) {			bool ForwardRef =
	// We don't know how to calculate a hash value for this yet.			Opts & static_cast<uint16_t>(ClassOptions::ForwardReference);
	// Currently we just skip it.			bool Scoped = Opts & static_cast<uint16_t>(ClassOptions::Scoped);
	Hash = 0;			bool UniqueName = Opts & static_cast<uint16_t>(ClassOptions::HasUniqueName);
	return Error::success();
	}

	if (!(Opts & static_cast<uint16_t>(codeview::ClassOptions::Scoped))) {			if (!ForwardRef && !Scoped)
	Hash = hashStringV1(Obj->getName());			Hash = hashStringV1(Obj->getName());
	return Error::success();			else if (!ForwardRef && UniqueName)
	}

	if (Opts & static_cast<uint16_t>(codeview::ClassOptions::HasUniqueName)) {
	Hash = hashStringV1(Obj->getUniqueName());			Hash = hashStringV1(Obj->getUniqueName());
	return Error::success();			else
	}			Hash = hashBufferV8(Rec.RawData);

	// This case is not implemented yet.
	Hash = 0;
	return Error::success();			return Error::success();
	}			}

	// Verifies that a given type record matches with a given hash value.			// Verifies that a given type record matches with a given hash value.
	// Currently we only verify SRC_LINE records.			// Currently we only verify SRC_LINE records.
	static Error verifyTIHash(const codeview::CVType &Rec, uint32_t Expected,			static Error verifyTIHash(const codeview::CVType &Rec, uint32_t Expected,
	uint32_t NumHashBuckets) {			uint32_t NumHashBuckets) {
	using namespace codeview;			using namespace codeview;
	▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[pdbdump] Verify LF_{CLASS,ENUM,INTERFACE,STRUCTURE,UNION} records with different hash function.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 60870

include/llvm/DebugInfo/CodeView/CVRecord.h

include/llvm/DebugInfo/PDB/Raw/Hash.h

lib/DebugInfo/PDB/Raw/Hash.cpp

lib/DebugInfo/PDB/Raw/TpiStream.cpp

[pdbdump] Verify LF_{CLASS,ENUM,INTERFACE,STRUCTURE,UNION} records with different hash function.
ClosedPublic