This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Parse module information from the DBI stream
ClosedPublic

Authored by zturner on Apr 26 2016, 4:41 PM.

Details

Summary

With this patch we extract basic module information from the PDB file. A module is what is referred to in the DIA SDK and DebugInfoPDB as a "compiland". So we are essentially enumerating all compilands and dumping some of their metadata. In particular we get the compiland name and contributing object files, as well as various indices that will point at more detailed symbol and type information to be used in subsequent patches.

With this, we are close to being able to implement PDBSymbolExe and PDBSymbolCompiland in terms of the raw interface.

Diff Detail

Event Timeline

zturner updated this revision to Diff 55126.Apr 26 2016, 4:41 PM
zturner retitled this revision from to [PDB] Parse module information from the DBI stream.
zturner updated this object.
zturner added a reviewer: ruiu.
zturner added a subscriber: llvm-commits.
zturner updated this revision to Diff 55136.Apr 26 2016, 5:15 PM

Fixed the post-increment operator on the stream iterator.

ruiu added inline comments.Apr 26 2016, 5:27 PM
include/llvm/DebugInfo/PDB/Raw/ModInfo.h
60

Isn't it a bit confusing that this Bytes is actually bytes but Bytes in ModInfo is a ModInfoBytes object.

63

Can you use iterator_range<ModInfoStreamIterator> instead of defining a new class?

lib/DebugInfo/PDB/Raw/ModInfo.cpp
50

Do you mean ulittle32_t by long?

zturner added inline comments.Apr 26 2016, 5:41 PM
lib/DebugInfo/PDB/Raw/ModInfo.cpp
51

I don't think so, but honestly I don't know. We're reading this value from a file, and it doesn't make sense for a pointer to be in a file, because it would be meaningless in the address space of the process reading the file. So far in the limited experiments I've run, the value of this pointer (and the other pointer earlier in the class definition) are always 0 when reading them in. My best guess is that they use these as bookkeeping during the life of the PDB, but the value that goes to disk is unimportant. I guess it's something we will figure out in time as we study the format more closely.

ruiu added inline comments.Apr 26 2016, 5:44 PM
lib/DebugInfo/PDB/Raw/ModInfo.cpp
51

Since you are not using this field at least at this moment, why don't you use uint32_t or ulittle32_t? I don't think this code will pass tests on LP64 machines because on that machine long* is 64-bit wide.

zturner updated this revision to Diff 55149.Apr 26 2016, 6:03 PM

Fixed review issues.

ruiu accepted this revision.Apr 26 2016, 6:06 PM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.Apr 26 2016, 6:06 PM
majnemer added inline comments.
lib/DebugInfo/PDB/Raw/ModInfo.cpp
53โ€“56

Won't it be 64-bit in size on a 64-bit system?

64

Incrementing an iterator past end looks suspicious.

zturner marked 3 inline comments as done.Apr 26 2016, 6:36 PM
zturner added inline comments.
lib/DebugInfo/PDB/Raw/ModInfo.cpp
53โ€“56

Not as far as I can tell. They have 2 different structures that are almost the same, one using a uint32 and one using a pointer. The former is the file layout format, and the latter is an in-memory format. But since we aren't using these fields yet anyway, it doesn't make sense to add the additional complexity, as we may find a better less convoluted way to do the same thing, or we may end up just not needing those fields anyway.

64

In the general case I agree, but in this case we have two null terminated strings stored back to back in a variable length structure, so this seems like the most elegant way to express that. Otherwise it's going to end up just being equivalent code but more verbose, since I will have to save off the module name into a temporary, then call module.data() + module.length() + 1.

ruiu added inline comments.Apr 26 2016, 6:38 PM
lib/DebugInfo/PDB/Raw/ModInfo.cpp
64

Yeah, I agree with Zach. This is where we intentionally overrun the end of the array to access stuff next to the string. This makes me wonder if C++ supports flexible array members.

zturner updated this revision to Diff 55153.Apr 26 2016, 6:46 PM
zturner edited edge metadata.

Fixed alignment issue by adding 2 bytes of padding. When I was using a pointer it was getting 4-byte aligned, but when I changed to a ulittle32_t it was getting only 2-byte aligned, messing up the rest of the structure. This should be correct now. Alternatively I could have used an aligned_ulittle32_t, but the name is so long it made the comment formatting really ugly.

ruiu added a comment.Apr 26 2016, 6:48 PM

I have no further comments. David?

zturner updated this revision to Diff 55154.Apr 26 2016, 6:50 PM

Also fixed a place where I was accidentally using raw integer types instead of endian types in the section contributions.

zturner added a comment.EditedApr 26 2016, 6:56 PM

Gah, is it better to use the aligned types, or to keep manually adding padding? The latest update also breaks the definition of SCBytes because:

uint16_t x;
uint32_t y;

is different from

ulittle16_t x;
ulittle32_t y;

Should I just switch to using aligned types everywhere, keeping in mind that the original definition is:

unsigned short;
unsigned int;

and assumes MSVC.

ruiu added a comment.Apr 26 2016, 6:59 PM

I'd personally prefer adding padding field because it is explicit and it is I believe more common than implicit padding when describing data structure in file.

By the way, we don't care about endianness of padding, so I think

uint8_t Pad1[4];

is more common than

ulittle16_t Pad1;
zturner updated this revision to Diff 55252.Apr 27 2016, 10:09 AM

Fix padding issues everywhere.

David, are you satisfied with the explanation about incrementing past the end of the iterator earlier, or do you have a suggestion for a better way to accomplish the same thing?

zturner closed this revision.May 6 2016, 4:44 PM