This is an archive of the discontinued LLVM Phabricator instance.

Native PDB reader was swapping module and object file names
AbandonedPublic

Authored by amccarth on Mar 14 2017, 1:37 PM.

Details

Reviewers
zturner
Summary

The native PDB reader code had inadvertently swapped the module names with the object file names. This corrects that and updates the affected tests.

Given some of the examples, it's easy to understand how this could be confusing. For the "main" object, the module name and the object name are the same. For certain Windows and MS CRT APIs, the module is a .lib but the object file is an import from a DLL (see Windows API sets).

Diff Detail

Event Timeline

amccarth created this revision.Mar 14 2017, 1:37 PM
zturner edited edge metadata.Mar 14 2017, 3:33 PM

So here is why I think it was originally correct:

The method that computes the address of the first character of the "object file name" first gets the address of the "module name", then adds the length of the module name plus 1 byte for the null terminator. So at least in the reference code here, module name comes before object file name.

It could just be that they chose poor names themselves, or are using the terms incorrectly. However, when I run llvm-pdbdump pretty on the file empty.pdb that is in the test file Inputs directory, it gives me this:

D:\src\llvmbuild\ninja>bin\llvm-pdbdump.exe pretty -compilands d:\src\llvm-mono\llvm\test\DebugInfo\pdb\Inputs\empty.pdb
Summary for d:\src\llvm-mono\llvm\test\DebugInfo\pdb\Inputs\empty.pdb
  Size: 102400 bytes
  Guid: {0B355641-86A0-A249-896F-9988FAE52FF0}
  Age: 1
  Attributes: HasPrivateSymbols
---COMPILANDS---
  d:\src\llvm\test\DebugInfo\PDB\Inputs\empty.obj
  * Linker *

The strings that are printed there are obtained by calling IDiaSymbol::get_Name(). For Compiland Symbols, get_Name() is documented as returning "the name of the compiland's object file" and get_LibraryName() is documented as returning "Name of the library or object file where object was loaded from." I never call get_libraryName() anywhere, but I think it's safe to assume that get_name() returns either the "module name" or the "object name", and get_libraryName returns the other.

Comparing to llvm-pdbdump raw, I get this:

D:\src\llvmbuild\ninja>bin\llvm-pdbdump.exe raw -modules d:\src\llvm-mono\llvm\test\DebugInfo\pdb\Inputs\empty.pdb
DBI Stream {
  Dbi Version: 19990903
  Age: 1
  Incremental Linking: Yes
  Has CTypes: No
  Is Stripped: No
  Machine Type: x86
  Symbol Record Stream Index: 8
  Public Symbol Stream Index: 7
  Global Symbol Stream Index: 6
  Toolchain Version: 12.0
  mspdb120.dll version: 12.0.31101
  Modules [
    {
      Name: d:\src\llvm\test\DebugInfo\PDB\Inputs\empty.obj
      Debug Stream Index: 12
      Object File Name: d:\src\llvm\test\DebugInfo\PDB\Inputs\empty.obj
      Num Files: 1
      Source File Name Idx: 0
      Pdb File Name Idx: 0
      Line Info Byte Size: 0
      C13 Line Info Byte Size: 88
      Symbol Byte Size: 208
      Type Server Index: 0
      Has EC Info: No
    }
    {
      Name: * Linker *
      Debug Stream Index: 14
      Object File Name:
      Num Files: 0
      Source File Name Idx: 0
      Pdb File Name Idx: 1
      Line Info Byte Size: 0
      C13 Line Info Byte Size: 0
      Symbol Byte Size: 516
      Type Server Index: 0
      Has EC Info: No
    }
  ]
}

So what I'm referring to as "name" and "object file name" appear to match up with the reference implementation. I'm not sure what the best thing to do here is, but having the variables be called the reverse of what they're called in the reference implementation could leave to even more confusion when someone else comes along and tries to understand the code.

Thoughts?

amccarth abandoned this revision.Mar 14 2017, 4:34 PM

I'm not completely convinced this is the wrong thing to do, but I won't fight it further. I've compensated for the backwards naming in my next patch.