This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Improve error handling when writes fail
ClosedPublic

Authored by rnk on May 18 2021, 12:37 PM.

Details

Summary

Handle PDB writing errors like any other error in LLD: emit an error and
continue. This allows the linker to print timing data and summary data
after linking, which can be helpful for finding PDB size problems. Also
report how large the file would have been.

Example output:

lld-link: error: Output data is larger than 4 GiB. File size would have been 6,937,108,480
lld-link: error: failed to write PDB file ./chrome.dll.pdb

Summary

   33282 Input OBJ files (expanded from all cmd-line inputs)
       4 PDB type server dependencies
       0 Precomp OBJ dependencies
33396931 Input type records

... snip ...

Input File Reading:           59756 ms ( 45.5%)
GC:                            7500 ms (  5.7%)
ICF:                           3336 ms (  2.5%)
Code Layout:                   6329 ms (  4.8%)
PDB Emission (Cumulative):    46192 ms ( 35.2%)
  Add Objects:                27609 ms ( 21.0%)
    Type Merging:             16740 ms ( 12.8%)
    Symbol Merging:           10761 ms (  8.2%)
  Publics Stream Layout:       9383 ms (  7.1%)
  TPI Stream Layout:           1678 ms (  1.3%)
  Commit to Disk:              3461 ms (  2.6%)

Total Link Time: 131244 ms (100.0%)

Diff Detail

Event Timeline

rnk created this revision.May 18 2021, 12:37 PM
rnk requested review of this revision.May 18 2021, 12:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2021, 12:37 PM
aganea accepted this revision.May 18 2021, 12:56 PM

Strangely, a few weeks ago I had a similair issue while trying to link an executable with MSVC-built .OBJ files. My PDB with lld-link.exe would have been 7.7 GB, but when linking the same .OBJ files with link.exe the PDB is 2.7 GB. Since we moved off MSVC, I haven't investigated in detail why. The problem doesn't occur on the same target when using Clang. I'm curious, is that your case?

In any case, LTGM.

This revision is now accepted and ready to land.May 18 2021, 12:56 PM
rnk added a comment.May 18 2021, 1:10 PM

Strangely, a few weeks ago I had a similair issue while trying to link an executable with MSVC-built .OBJ files. My PDB with lld-link.exe would have been 7.7 GB, but when linking the same .OBJ files with link.exe the PDB is 2.7 GB. Since we moved off MSVC, I haven't investigated in detail why. The problem doesn't occur on the same target when using Clang. I'm curious, is that your case?

In any case, LTGM.

Yes, our PDBs end up getting too big when we use either coverage (https://crbug.com/1159468) or static O0 builds (https://crbug.com/1179085). I think the common denominator between those two build configs is that they do not delete trivial inline functions, so you end up with standalone definitions of every little helper like std::unique_ptr<>::get(), and that somehow adds up to more than if those things were inlined everywhere.

Thanks!

This revision was landed with ongoing or failed builds.May 18 2021, 1:20 PM
This revision was automatically updated to reflect the committed changes.

We had similar OOM issues on the bots, as mentionned in the two bug reports. Regarding https://bugs.chromium.org/p/chromium/issues/detail?id=1179085, if you're using /DEBUG:GHASH the following patch might fix it? (but still generates > 4 GB .PDBs, which is probably a merging bug)

diff --git a/lld/COFF/DebugTypes.cpp b/lld/COFF/DebugTypes.cpp
index fedcb054540f..8bd8723f4bf2 100644
--- a/lld/COFF/DebugTypes.cpp
+++ b/lld/COFF/DebugTypes.cpp
@@ -1064,7 +1064,8 @@ void TypeMerger::mergeTypesWithGHash() {
 
   // Cap the table size so that we can use 32-bit cell indices. Type indices are
   // also 32-bit, so this is an inherent PDB file format limit anyway.
-  tableSize = std::min(size_t(INT32_MAX), tableSize);
+  tableSize =
+      std::min(size_t(INT32_MAX) - TypeIndex::FirstNonSimpleIndex, tableSize);
   ghashState.table.init(static_cast<uint32_t>(tableSize));
 
   // Insert ghashes in parallel. During concurrent insertion, we cannot observe

The issue was that source->indexMapStorage[i] = TypeIndex::fromArrayIndex(cellIdx); would store invalid offsets since the high bit (DecoratedItemIdMask) is dismissed when doing TypeIndex::toArrayIndex().

rnk added a comment.May 18 2021, 4:04 PM

We had similar OOM issues on the bots, as mentionned in the two bug reports. Regarding https://bugs.chromium.org/p/chromium/issues/detail?id=1179085, if you're using /DEBUG:GHASH the following patch might fix it? (but still generates > 4 GB .PDBs, which is probably a merging bug)

diff --git a/lld/COFF/DebugTypes.cpp b/lld/COFF/DebugTypes.cpp
index fedcb054540f..8bd8723f4bf2 100644
--- a/lld/COFF/DebugTypes.cpp
+++ b/lld/COFF/DebugTypes.cpp
@@ -1064,7 +1064,8 @@ void TypeMerger::mergeTypesWithGHash() {
 
   // Cap the table size so that we can use 32-bit cell indices. Type indices are
   // also 32-bit, so this is an inherent PDB file format limit anyway.
-  tableSize = std::min(size_t(INT32_MAX), tableSize);
+  tableSize =
+      std::min(size_t(INT32_MAX) - TypeIndex::FirstNonSimpleIndex, tableSize);
   ghashState.table.init(static_cast<uint32_t>(tableSize));
 
   // Insert ghashes in parallel. During concurrent insertion, we cannot observe

The issue was that source->indexMapStorage[i] = TypeIndex::fromArrayIndex(cellIdx); would store invalid offsets since the high bit (DecoratedItemIdMask) is dismissed when doing TypeIndex::toArrayIndex().

I see, I guess we do need that patch.

I thought I was using ghashing, but it turned out I wasn't. I switched to ghash, and things got faster, even without ghashes in object files. This has given me motivation to make ghashing the default type merging strategy. I'll have to follow up on that.