This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/DebugInfo/PDB/Raw/
-
trunk/
-
lib/
-
DebugInfo/
-
PDB/
-
Raw/
-
TpiStream.cpp

Differential D21393

[pdbdump] Verify LF_{CLASS,ENUM,INTERFACE,STRUCTURE,UNION} records with different hash function.
ClosedPublic

Authored by ruiu on Jun 15 2016, 11:42 AM.

Download Raw Diff

Details

Reviewers

zturner

Commits

rG74c4341dde8a: [codeview] Use hashBufferV8 to verify all type records.
rL272930: [codeview] Use hashBufferV8 to verify all type records.

Summary

This patch adds a new hash function hashBufferV8 and use it to verify TPI records.

Diff Detail

Repository: rL LLVM

Event Timeline

ruiu updated this revision to Diff 60870.Jun 15 2016, 11:42 AM

ruiu retitled this revision from to [pdbdump] Verify LF_{CLASS,ENUM,INTERFACE,STRUCTURE,UNION} records with different hash function..

ruiu updated this object.

ruiu added a reviewer: zturner.

ruiu added a subscriber: llvm-commits.

I think we should have some tests confirming that these functions produce the same values that are in MS-generated PDBs for the same record contents.

Do we already have unit tests for PDB?

Ah, we have tests under unittests/DebugInfo/PDB. I'll add a new file there.

Actually, empty.pdb has type records that uses this hash function, so if it's broken the test would break. You can confirm that by returning some random value from hashBufferV8. Do we need more tests?

Well but empty.pdb is not generated by us. Since now we're generating the hash, rather than just reading the hash that someone else generated I think we should write a test that verifies that a) we're using the correct hash function, and b) we're hashing the right buffers. So I was thinking of a test that would work like the following:

Read type records, and the corresponding hashes out of empty.pdb
Use our functions to hash those records and generate hash values.
verify that the hash we generate is the same one that was found in the PDB file.

Similar to how you ran that test locally a week or so ago to check whether the hashes match so we know we're using it correctly, but in a test.

I think that's exactly what we are doing -- in this code, we read type records, compute hash values using our hash function, and compare them with hash values on the PDB file. If they are different, pdbdump reports that the file is corrupted.

Ahh, I think I get it. I'm a little concerned about the performance impact
of this though. The whole reason the hash is stored in the file is so that
it doesn't have to recompute the hash up front. If we're re-computing it
every time we load the type streams, it seems like it will be a big
performance problem. I think we should only compute the hash when we
modify a record and/or write the file.

Yeah, I don't think this code will live here forever. I'm adding code here to verify that my understanding on how the hash values are computed is correct. Once everything is set, I'll probably remove it from here and move this to a writer.

In D21393#459087, @ruiu wrote:

Yeah, I don't think this code will live here forever. I'm adding code here to verify that my understanding on how the hash values are computed is correct. Once everything is set, I'll probably remove it from here and move this to a writer.

Ok, for the time being can you add a command line option --verify-tpi-hash-integrity and only do this check if it is set? Sometimes I dump really large files, and I'm afraid this would double the amount of time it takes which is already >= 1 minute in some cases.

Rebased

zturner added inline comments.Jun 16 2016, 11:12 AM

lib/DebugInfo/PDB/Raw/TpiStream.cpp

69 ↗

(On Diff #60978)

I was imagining that we wouldn't even have this TpiHashVerifier class, and that everything would just go in TypeDumper. Does this not work? For example, imagine you delete all of this code, and delete the TpiHashVerifier class, and delete lines 205-209. Then, in TypeDumper.cpp, you change CVTypeDumper::visitClass to look like this:

Error CVTypeDumper::visitClass(ClassRecord &Class) {
  uint16_t Props = static_cast<uint16_t>(Class.getOptions());
  W->printNumber("MemberCount", Class.getMemberCount());
  W->printFlags("Properties", Props, makeArrayRef(ClassOptionNames));
  printTypeIndex("FieldList", Class.getFieldList());
  printTypeIndex("DerivedFrom", Class.getDerivationList());
  printTypeIndex("VShape", Class.getVTableShape());
  W->printNumber("SizeOf", Class.getSize());
  W->printString("Name", Class.getName());
  if (Props & uint16_t(ClassOptions::HasUniqueName))
    W->printString("LinkageName", Class.getUniqueName());
  Name = Class.getName();

  /* NEW CODE HERE */
  if (auto EC = verifyHash(Class))
    return EC;
  return Error::success();
}

With the current implementation, we walk the entire type array twice. Once to verify hashes, and once to dump them. So if we could do it all in one pass, it seems better.

It may work (though I didn't try), but I think the current structure is better since we are going to remove this code from here and add code to actually generate hash values to the writer. In the current structure, the code to generate hash values are separated and concentrated. It's better than scatter it into multiple methods.

zturner accepted this revision.Jun 16 2016, 11:38 AM

zturner edited edge metadata.

This revision is now accepted and ready to land.Jun 16 2016, 11:38 AM

Closed by commit rL272930: [codeview] Use hashBufferV8 to verify all type records. (authored by ruiu). · Explain WhyJun 16 2016, 11:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

DebugInfo/

PDB/

Raw/

TpiStream.cpp

30 lines

Diff 61002

llvm/trunk/lib/DebugInfo/PDB/Raw/TpiStream.cpp

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

	TpiStream::TpiStream(const PDBFile &File,			TpiStream::TpiStream(const PDBFile &File,
	std::unique_ptr<MappedBlockStream> Stream)			std::unique_ptr<MappedBlockStream> Stream)
	: Pdb(File), Stream(std::move(Stream)) {}			: Pdb(File), Stream(std::move(Stream)) {}

	TpiStream::~TpiStream() {}			TpiStream::~TpiStream() {}

	// Computes a hash for a given TPI record.			// Computes a hash for a given TPI record.
	template <typename T> static uint32_t getTpiHash(T &Rec) {			template <typename T>
				static uint32_t getTpiHash(T &Rec, const CVRecord<TypeLeafKind> &RawRec) {
	auto Opts = static_cast<uint16_t>(Rec.getOptions());			auto Opts = static_cast<uint16_t>(Rec.getOptions());

	// We don't know how to calculate a hash value for this yet.			bool ForwardRef =
	// Currently we just skip it.			Opts & static_cast<uint16_t>(ClassOptions::ForwardReference);
	if (Opts & static_cast<uint16_t>(ClassOptions::ForwardReference))			bool Scoped = Opts & static_cast<uint16_t>(ClassOptions::Scoped);
	return 0;			bool UniqueName = Opts & static_cast<uint16_t>(ClassOptions::HasUniqueName);

	if (!(Opts & static_cast<uint16_t>(ClassOptions::Scoped)))			if (!ForwardRef && !Scoped)
	return hashStringV1(Rec.getName());			return hashStringV1(Rec.getName());
				if (!ForwardRef && UniqueName)
	if (Opts & static_cast<uint16_t>(ClassOptions::HasUniqueName))
	return hashStringV1(Rec.getUniqueName());			return hashStringV1(Rec.getUniqueName());
				return hashBufferV8(RawRec.RawData);
	// This case is not implemented yet.
	return 0;
	}			}

	namespace {			namespace {
	class TpiHashVerifier : public TypeVisitorCallbacks {			class TpiHashVerifier : public TypeVisitorCallbacks {
	public:			public:
	TpiHashVerifier(FixedStreamArray<support::ulittle32_t> &HashValues,			TpiHashVerifier(FixedStreamArray<support::ulittle32_t> &HashValues,
	uint32_t NumHashBuckets)			uint32_t NumHashBuckets)
	: HashValues(HashValues), NumHashBuckets(NumHashBuckets) {}			: HashValues(HashValues), NumHashBuckets(NumHashBuckets) {}

	Error visitUdtSourceLine(UdtSourceLineRecord &Rec) override {			Error visitUdtSourceLine(UdtSourceLineRecord &Rec) override {
	return verifySourceLine(Rec);			return verifySourceLine(Rec);
	}			}

	Error visitUdtModSourceLine(UdtModSourceLineRecord &Rec) override {			Error visitUdtModSourceLine(UdtModSourceLineRecord &Rec) override {
	return verifySourceLine(Rec);			return verifySourceLine(Rec);
	}			}

	Error visitClass(ClassRecord &Rec) override { return verify(Rec); }			Error visitClass(ClassRecord &Rec) override { return verify(Rec); }
	Error visitEnum(EnumRecord &Rec) override { return verify(Rec); }			Error visitEnum(EnumRecord &Rec) override { return verify(Rec); }
	Error visitUnion(UnionRecord &Rec) override { return verify(Rec); }			Error visitUnion(UnionRecord &Rec) override { return verify(Rec); }

	Error visitTypeEnd(const CVRecord<TypeLeafKind> &Record) override {			Error visitTypeBegin(const CVRecord<TypeLeafKind> &Rec) override {
	++Index;			++Index;
				RawRecord = &Rec;
	return Error::success();			return Error::success();
	}			}

	private:			private:
	template <typename T> Error verify(T &Rec) {			template <typename T> Error verify(T &Rec) {
	uint32_t Hash = getTpiHash(Rec);			uint32_t Hash = getTpiHash(Rec, *RawRecord);
	if (Hash && Hash % NumHashBuckets != HashValues[Index])			if (Hash % NumHashBuckets != HashValues[Index])
	return make_error<RawError>(raw_error_code::invalid_tpi_hash);			return make_error<RawError>(raw_error_code::invalid_tpi_hash);
	return Error::success();			return Error::success();
	}			}

	template <typename T> Error verifySourceLine(T &Rec) {			template <typename T> Error verifySourceLine(T &Rec) {
	char Buf[4];			char Buf[4];
	support::endian::write32le(Buf, Rec.getUDT().getIndex());			support::endian::write32le(Buf, Rec.getUDT().getIndex());
	uint32_t Hash = hashStringV1(StringRef(Buf, 4));			uint32_t Hash = hashStringV1(StringRef(Buf, 4));
	if (Hash % NumHashBuckets != HashValues[Index])			if (Hash % NumHashBuckets != HashValues[Index])
	return make_error<RawError>(raw_error_code::invalid_tpi_hash);			return make_error<RawError>(raw_error_code::invalid_tpi_hash);
	return Error::success();			return Error::success();
	}			}

	FixedStreamArray<support::ulittle32_t> HashValues;			FixedStreamArray<support::ulittle32_t> HashValues;
				const CVRecord<TypeLeafKind> *RawRecord;
	uint32_t NumHashBuckets;			uint32_t NumHashBuckets;
	uint32_t Index = 0;			uint32_t Index = -1;
	};			};
	}			}

	// Verifies that a given type record matches with a given hash value.			// Verifies that a given type record matches with a given hash value.
	// Currently we only verify SRC_LINE records.			// Currently we only verify SRC_LINE records.
	Error TpiStream::verifyHashValues() {			Error TpiStream::verifyHashValues() {
	TpiHashVerifier Verifier(HashValues, Header->NumHashBuckets);			TpiHashVerifier Verifier(HashValues, Header->NumHashBuckets);
	CVTypeVisitor Visitor(Verifier);			CVTypeVisitor Visitor(Verifier);
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines