This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Early load PDB type server files
ClosedPublic

Authored by aganea on Apr 1 2019, 1:28 PM.

Details

Summary

In order to have all inputs loaded & available upfront when starting to merge types, we need to open PDB type server files early in the process.

PDB type servers are created when using MSVC /Zi. In that case, the debug info for a given project is stored in a single PDB type server file. All OBJ files in that project will refer to their corresponding PDB through a TypeServer2Record.

This patch also fixes a tiny issue where the PDB "Age" wasn't validated against the OBJ's TypeServer2Record.Age. The age is a concept used by MSVC for incremental compilation: the PDB's Age is incremented by MSVC on each incremental build, but the GUID remains the same.

This patch is step 2. in "Proposed commit strategy" in D59226

Diff Detail

Repository
rL LLVM

Event Timeline

aganea created this revision.Apr 1 2019, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 1:28 PM
aganea edited the summary of this revision. (Show Details)Apr 1 2019, 1:32 PM

Ping! I know Reid is OOO this week and Zachary might not be available for reviewing. @ruiu could you please take a quick look at this? Thank you in advance for your time.

aganea edited the summary of this revision. (Show Details)Apr 17 2019, 6:00 AM
takuto.ikuta added inline comments.Apr 17 2019, 4:40 PM
COFF/InputFiles.cpp
44 ↗(On Diff #193163)

better to not have llvm:: prefix for type in this code when we have this?

722 ↗(On Diff #193163)

const ObjFile *File ?

745 ↗(On Diff #193163)

can be const ObjFile * here too?

758 ↗(On Diff #193163)

same here?

ruiu added a comment.Apr 17 2019, 7:01 PM

InputFile abstracts the concept of a file containing symbols. In each parse() function, we parse a symbol table and then insert symbols to the symbol table. parse() is part of a symbol resolution process step.

Since PDBInputFile doesn't contain any symbols, it doesn't fit to that concept, and it doesn't seem you need to parse them while we parse other types of input files.

So, why don't you just add a pdb file to some global list, and after resolving all symbols, parse them in a simple for loop? I guess that's conceptually much simpler.

rnk added a comment.Apr 18 2019, 11:56 AM

InputFile abstracts the concept of a file containing symbols. In each parse() function, we parse a symbol table and then insert symbols to the symbol table. parse() is part of a symbol resolution process step.

Since PDBInputFile doesn't contain any symbols, it doesn't fit to that concept, and it doesn't seem you need to parse them while we parse other types of input files.

So, why don't you just add a pdb file to some global list, and after resolving all symbols, parse them in a simple for loop? I guess that's conceptually much simpler.

IIUC, the point of this change is to start loading types in parallel as early as possible. The suggestion to accumulate the inputs on the side and wait until after symbol resolution won't exploit all the parallelism that's available.

In D60095#1472111, @rnk wrote:

InputFile abstracts the concept of a file containing symbols. In each parse() function, we parse a symbol table and then insert symbols to the symbol table. parse() is part of a symbol resolution process step.

Since PDBInputFile doesn't contain any symbols, it doesn't fit to that concept, and it doesn't seem you need to parse them while we parse other types of input files.

So, why don't you just add a pdb file to some global list, and after resolving all symbols, parse them in a simple for loop? I guess that's conceptually much simpler.

IIUC, the point of this change is to start loading types in parallel as early as possible. The suggestion to accumulate the inputs on the side and wait until after symbol resolution won't exploit all the parallelism that's available.

As long as this goes in the LinkerDriver's TaskQueue, it's fine, no? I could simply move PDBInputFile out of the InputFile hierarchy, I think that's what @ruiu was proposing?
It wasn't my goal to parallelize the TaskQueue just yet, rather introduce the ghash parallelism first, then parallelize the Type merging.

aganea updated this revision to Diff 196643.Apr 25 2019, 8:48 AM
aganea marked 4 inline comments as done.
aganea retitled this revision from [LLD][COFF] Move PDB type server loading from PDB.cp early into InputFiles.cpp and introduce PDBInputFile to [LLD][COFF] Early load PDB type server files.
aganea edited the summary of this revision. (Show Details)
aganea added a reviewer: mstorsjo.

I simplfied the patch by moving the PDBInputFile behavior into the existing TypeServerSource class. This makes things more elegant and keeps everything related to debug types into one place.

Tested (check-lld) on three different configurations:

  1. MSBuild + MSVC 15.9.11 with LLVM_ENABLE_ASSERTIONS on, in a Debug target, under Windows 10 1803.
  2. Ninja + Clang 8.0 in a Release target, under Windows 10 1803.
  3. Ninja + GCC 7.3 in a Release target, under WSL/Ubuntu 18.04.
aganea added a comment.May 7 2019, 6:30 AM

Ping! Any further suggestions? Thanks in advance for your time.

ruiu added a comment.May 20 2019, 6:52 AM

Overall looking good.

COFF/DebugTypes.cpp
33–36 ↗(On Diff #196643)

If a constructor can fail, one pattern to handle that is to make the constructor private and instead provide static ErrorOr<TypeServerSource> getInstance() function. In that function, you can load a PDB and then instantiate TypeServerSouce only when the file loading succeeded. If failed, you can return an error.

149–150 ↗(On Diff #196643)

nit: return Path + sys::path::...

aganea updated this revision to Diff 200524.May 21 2019, 9:14 AM
aganea marked 3 inline comments as done.
aganea added inline comments.May 21 2019, 9:16 AM
COFF/DebugTypes.cpp
33–36 ↗(On Diff #196643)

As suggested, added static Expected<TypeServerSource *> TypeServerSource::getInstance() and replaced previous lld::coff::makeTypeServerSource() by lld::coff::loadTypeServerSource() to be consistent.

Note - PDB load errors are simply queued in TypeServerSource::Instance, and displayed later by TypeServerSource::findFromFile() which is called by PDBLinker::maybeMergeTypeServerPDB().
I've tried displaying the errors on the spot, but at that point we don't know who requested loading of that PDB, and it feels wrong to decorelate the PDB errors from the requesting OBJ:

lld-link: warning: 'bad-block-size.pdb': The PDB file is corrupt. MSF superblock is missing
...
... (various other warnings)
...
lld-link: warning: Cannot use debug info for 'F:\svn\build\tools\lld\test\COFF\Output\pdb-type-server-native-errors.yaml.tmp.obj' [LNK4099]
>>> failed to load reference 'bad-block-size.pdb': (see above)

Given it's not a fatal error, I prefer the current behavior, it feels better from a user perspective:

ld-link: warning: Cannot use debug info for 'F:\svn\build\tools\lld\test\COFF\Output\pdb-type-server-native-errors.yaml.tmp.obj' [LNK4099]
>>> failed to load reference 'bad-block-size.pdb': The PDB file is corrupt. MSF superblock is missing
ruiu accepted this revision.May 28 2019, 6:06 AM

LGTM

This revision is now accepted and ready to land.May 28 2019, 6:06 AM
This revision was automatically updated to reflect the committed changes.

This change broke one of the tests on the Windows LLDB bot:

http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/5209

Could you please have a look and revert it if you don't have a quick fix?

Looking now.

@stella.stamenova : I'll have to revert it, I'm not sure what's going on. When I run the tests on my Windows PC, many tests are tagged as 'unsupported', I don't know why:

Unsupported Tests  : 796

Whereas on your build bot, there are no 'unsupported' tests:

Expected Passes    : 1596
Expected Failures  : 11
Unsupported Tests  : 46

If I try to manually run the test that fails on the bot: C:\Users\aganea\AppData\Local\Programs\Python\Python37-32\python.exe F:/svn/buildninja/./bin/llvm-lit.py -sv -vv -a F:\svn\lldb\lit\tools\lldb-mi\exec\exec-finish.test, I also get:

UNSUPPORTED: LLDB :: tools/lldb-mi/exec/exec-finish.test (1 of 1)
Test is unsupported

If I run the commands in the test manually, I get to a point where lldb-mi doesn't want to run:

F:\svn\buildninja\tools\lldb\lit>f:\svn\buildninja\bin\lldb-mi.exe
MI: Error: Driver. LLDB Debugger.
MI: Error: Driver Manager. Driver 'Machine Interface Driver Version: 1.0.0.9' (ID:'MIDriver') initialise failed. Driver. LLDB Debugger.

Any suggestions? I'll revert the patch in the meanwhile.

stella.stamenova added a comment.EditedMay 28 2019, 2:04 PM

I am not very familiar with the lldb-mi tests specifically, but it looks like they require python to run (and are marked as unsupported otherwise):

# lldb-mi always fails without Python support
if lldbmi.was_resolved and not config.lldb_disable_python:
    config.available_features.add('lldb-mi')

My guess is that the difference comes from the options you passed to cmake at configuration time. This is what the Buildbot does:

cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DPYTHON_HOME="C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64" -DLLVM_ENABLE_ASSERTIONS=OFF

On Windows, I've noticed that if we don't correctly configure PYTHON_HOME, there are a variety of issues that pop up.

@stella.stamenova : I'll have to revert it, I'm not sure what's going on. When I run the tests on my Windows PC, many tests are tagged as 'unsupported', I don't know why:

Unsupported Tests  : 796

Whereas on your build bot, there are no 'unsupported' tests:

Expected Passes    : 1596
Expected Failures  : 11
Unsupported Tests  : 46

If I try to manually run the test that fails on the bot: C:\Users\aganea\AppData\Local\Programs\Python\Python37-32\python.exe F:/svn/buildninja/./bin/llvm-lit.py -sv -vv -a F:\svn\lldb\lit\tools\lldb-mi\exec\exec-finish.test, I also get:

UNSUPPORTED: LLDB :: tools/lldb-mi/exec/exec-finish.test (1 of 1)
Test is unsupported

If I run the commands in the test manually, I get to a point where lldb-mi doesn't want to run:

F:\svn\buildninja\tools\lldb\lit>f:\svn\buildninja\bin\lldb-mi.exe
MI: Error: Driver. LLDB Debugger.
MI: Error: Driver Manager. Driver 'Machine Interface Driver Version: 1.0.0.9' (ID:'MIDriver') initialise failed. Driver. LLDB Debugger.

Any suggestions? I'll revert the patch in the meanwhile.

aganea added a comment.Jun 3 2019, 5:44 AM

Just for the record, the lldb tests issues above were caused first by a missing SWIG installation on my PC, then by using wrong SWIG version (4 instead if 3), and then my locale being fr_CA (which still causes a few tests to fail).

I've added a test in LLD to cover the issue raised by LLDB :: tools/lldb-mi/exec/exec-finish.test. See lld/trunk/test/COFF/pdb-type-server-invalid-signature.yaml L20.

Relanded as rL362393.

rnk added a comment.Jun 3 2019, 3:02 PM

I'm seeing a lot of LNK4099 warnings after this change, and Chromium links with -Werror. I also saw a fair number while doing some local testing with asan. I'll collect some more information, but I wanted to give a heads up that this might be problematic.

aganea added a comment.Jun 3 2019, 3:45 PM
In D60095#1528125, @rnk wrote:

I'm seeing a lot of LNK4099 warnings after this change, and Chromium links with -Werror. I also saw a fair number while doing some local testing with asan. I'll collect some more information, but I wanted to give a heads up that this might be problematic.

I can take a look asap if you can provide a complete warning, along with the incriminated OBJ path and corresponding PDB?

aganea added a comment.Jun 3 2019, 3:50 PM
In D60095#1528125, @rnk wrote:

I'm seeing a lot of LNK4099 warnings after this change, and Chromium links with -Werror. I also saw a fair number while doing some local testing with asan. I'll collect some more information, but I wanted to give a heads up that this might be problematic.

How do you test locally with asan? ninja check-asan?

aganea added a comment.Jun 3 2019, 4:50 PM

You're right, there's indeed an issue, checking now.

rnk added a comment.Jun 4 2019, 1:00 PM

Looks like I wrote the following comment, but didn't submit it yesterday. Maybe I felt it didn't have enough information. Anyway, I see the proposed fix, I'll try it out.

In D60095#1528125, @rnk wrote:

I'm seeing a lot of LNK4099 warnings after this change, and Chromium links with -Werror. I also saw a fair number while doing some local testing with asan. I'll collect some more information, but I wanted to give a heads up that this might be problematic.

This happens when linking hello world:

$ clang-cl -c t.cpp && lld-link t.obj -debug
lld-link: warning: Cannot use debug info for 'libcmt.lib(gs_cookie.obj)' [LNK4099]
>>> failed to load reference 'd:\agent\_work\1\s\binaries\amd64ret\lib\amd64\libcmt.amd64.pdb': The signature does not match; the file(s) might be out of date.

lld-link: warning: Cannot use debug info for 'libcmt.lib(gs_report.obj)' [LNK4099]
>>> failed to load reference 'd:\agent\_work\1\s\binaries\amd64ret\lib\amd64\libcmt.amd64.pdb': The signature does not match; the file(s) might be out of date.

lld-link: warning: Cannot use debug info for 'libcmt.lib(loadcfg.obj)' [LNK4099]
>>> failed to load reference 'd:\agent\_work\1\s\binaries\amd64ret\lib\amd64\libcmt.amd64.pdb': The signature does not match; the file(s) might be out of date.
...
aganea added a comment.Jun 4 2019, 2:12 PM
This comment was removed by aganea.