This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Generate import modules in PDB
ClosedPublic

Authored by aganea on Nov 21 2018, 8:05 AM.
Tokens
"Like" token, awarded by santagada.

Details

Summary

This patch makes LLD generate a PDB with import modules & coff groups.
This is suitable for hot patching tools like Recode and Live++, which need to extract this information from the PDB. We're using this patch on several productions for a few months now.


We generate one debug symbol stream for each imported dll:

                          Streams
============================================================
[...]
  Stream 10 ( 256 bytes): [Module "Import:pdb-publics-import.test.tmp1.dll"]
             Blocks: [9]

We generate two modules for each imported dll:
(currently the first module is empty because llvm-lib does not generate the additional symbol stream, and some descriptors are not imported (remains TODO)

                        Module Stats
============================================================
[...]
Mod 0001 | `pdb-publics-import.test.tmp1.dll`:
  Mod 1 (debug info not present): [pdb-publics-import.test.tmp1.dll]
Mod 0002 | `Import:pdb-publics-import.test.tmp1.dll`:
  Stream 10, 256 bytes

    Symbols
                                       Total:       8 entries (     248 bytes)
    --------------------------------------------------------------------------
                                   S_THUNK32:       2 entries (      72 bytes)
                                       S_END:       2 entries (       8 bytes)
                                   S_OBJNAME:       2 entries (      88 bytes)
                                  S_COMPILE3:       2 entries (      80 bytes)

    Chunks
                                       Total:       0 entries (       0 bytes)
    --------------------------------------------------------------------------

We ensure the modules' first contrib section is properly set (65535 values):

                          Modules
============================================================
[...]
Mod 0001 | `pdb-publics-import.test.tmp1.dll`:
SC[???]  | mod = 65535, 65535:0000, size = -1, data crc = 0, reloc crc = 0
        none
Obj: `F:\svn\build2\tools\lld\test\COFF\Output\pdb-publics-import.test.tmp1.lib`:
debug stream: 65535, # files: 0, has ec info: false
pdb file ni: 0 ``, src file ni: 0 ``
Mod 0002 | `Import:pdb-publics-import.test.tmp1.dll`:
SC[.text]  | mod = 2, 0001:0032, size = 6, data crc = 0, reloc crc = 0
        IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ
Obj: `F:\svn\build2\tools\lld\test\COFF\Output\pdb-publics-import.test.tmp1.lib`:
debug stream: 10, # files: 0, has ec info: false
pdb file ni: 0 ``, src file ni: 0 ``

We generate proper symbol streams for each import module, in the same way as link.exe:

                          Symbols
============================================================
[...]
Mod 0002 | `Import:pdb-publics-import.test.tmp1.dll`:
     4 | S_OBJNAME [size = 44] sig=0, `pdb-publics-import.test.tmp1.dll`
    48 | S_COMPILE3 [size = 40]
         machine = intel x86-x64, Ver = LLVM Linker, language = link
         frontend = 0.0.0.0, backend = 14.10.25019.0
         flags = none
    88 | S_THUNK32 [size = 36] `exportfn1`
         parent = 0, end = 124, next = 0
         kind = thunk, size = 6, addr = 0001:0016
   124 | S_END [size = 4]
   128 | S_OBJNAME [size = 44] sig=0, `pdb-publics-import.test.tmp1.dll`
   172 | S_COMPILE3 [size = 40]
         machine = intel x86-x64, Ver = LLVM Linker, language = link
         frontend = 0.0.0.0, backend = 14.10.25019.0
         flags = none
   212 | S_THUNK32 [size = 36] `exportfn2`
         parent = 0, end = 248, next = 0
         kind = thunk, size = 6, addr = 0001:0032
   248 | S_END [size = 4]

And we also generate COFF groups in the * Linker * section, out from PartialSections (that is, unmerged sections).
(there's a subtly here: the .idata$XXX COFF groups have the "write" flag set, whereas the corresponding .idata section does not have it - this is what link.exe does)

Mod 0003 | `* Linker *`:
[...]
   588 | S_SECTION [size = 28] `.text`
         length = 38, alignment = 12, rva = 4096, section # = 1
         characteristics =
           code
           execute permissions
           read permissions
   616 | S_COFFGROUP [size = 24] `.text`
         length = 8, addr = 0001:0000
         characteristics =
           code
           execute permissions
           read permissions
   640 | S_SECTION [size = 28] `.rdata`
         length = 61, alignment = 12, rva = 8192, section # = 2
         characteristics =
           initialized data
           read permissions
   668 | S_SECTION [size = 28] `.idata`
         length = 145, alignment = 12, rva = 12288, section # = 3
         characteristics =
           initialized data
           read permissions
   696 | S_COFFGROUP [size = 28] `.idata$2`
         length = 40, addr = 0003:0000
         characteristics =
           initialized data
           read permissions
           write permissions
   724 | S_COFFGROUP [size = 28] `.idata$4`
         length = 24, addr = 0003:0040
         characteristics =
           initialized data
           read permissions
           write permissions
   752 | S_COFFGROUP [size = 28] `.idata$5`
         length = 24, addr = 0003:0064
         characteristics =
           initialized data
           read permissions
           write permissions
   780 | S_COFFGROUP [size = 28] `.idata$6`
         length = 24, addr = 0003:0088
         characteristics =
           initialized data
           read permissions
           write permissions
   808 | S_COFFGROUP [size = 28] `.idata$7`
         length = 33, addr = 0003:0112
         characteristics =
           initialized data
           read permissions
           write permissions

Diff Detail

Repository
rL LLVM

Event Timeline

aganea created this revision.Nov 21 2018, 8:05 AM

Please note this work was started from Stefan Reinalter's D49230 and D49231. We agreed offline with Stefan that I will finish those patches.

The file below shows the remaining differences betwen link and lld-link (with this patch). I've modified lld\test\COFF\Output\pdb-publics-import.test to use link.exe instead of lld-link.exe

Do you happen to know what these are used for? I'm ok with supporting them just for the sake of compatibility, but it's always nice to know "if these aren't present, X won't work"

Do you happen to know what these are used for? I'm ok with supporting them just for the sake of compatibility, but it's always nice to know "if these aren't present, X won't work"

Some edit-and-continue tools rely on this: Recode and Live++. These tools need to extract and repro the linking with the EXE/PDB alone. They need this information for recompiling TUs that change, and hot patch the process in memory.

aganea added a comment.EditedNov 21 2018, 10:44 AM

This all boils down to iterativity. Using the tools I've mentionned above allows for really quick iterations, when you need to change small bits of code. When you're iterating over some gameplay or UI code, for example.
Without these tools, for testing any small change, you need to: compile the TU, link the whole EXE, start it, load the game map, and do a walkthrough until you're at the right position in the game world. This can take more than 10 minutes.
With these edit-and-continue tools, you don't shut down the game. The player remains at the same position in the game. You simply change the code, the tool applies the changes, and that's it. You can go for hundreds of iterations like this. Live++ even allows the game to save/reload data structures, which means that you can change pretty much anything in the code, including class definitions.

rnk added a comment.Nov 26 2018, 6:43 PM

Cool. :)

lld/trunk/COFF/Driver.cpp
1294

Do you want to submit this separately? It seems to cause wide ranging mechanical changes to the tests. Feel free to submit it without review.

lld/trunk/COFF/Writer.cpp
844–846

This seems like it's really just replicating Map into some longer lived data structure, and then creating a backreference from the output section to the concatenated input section. There are three other helper functions that take a reference to Map, and that type is ridiculously long. I think we could clean this code up significantly if we planned some NFC changes first to make Map a member of Writer and then give it a real key and value type. WDYT? I'm imagining that the type would become: std::map<NameAndChars, InputSection>

That would have the side effect of cleaning up a lot of unreadable Pair.second and Pair.first.first accesses in this code.

Honestly I started looking at this at first just because we copy the Chunks vector again, and I was wondering if that could be avoided, especially for a no-PDB build...

pcc added inline comments.Nov 26 2018, 6:50 PM
lld/trunk/COFF/Driver.cpp
1294

Are you sure that this is what link.exe does? I'm pretty sure I added this here after noticing that this was what link.exe did, see D45737

aganea marked 2 inline comments as done.Nov 27 2018, 7:15 AM

After further testing, in some scenarios this patch doesn't seem to be enough for Recode to work. I'll pause this patch and follow up with Will (@lantictac) to ensure everything is covered.

lld/trunk/COFF/Driver.cpp
1294

@pcc: Please see the zip file above for comparing the output from different linkers.

I figured out a few more things with link.exe along the way:

  • S_TRAMPOLINE records are used for jump thunks when incremental linking is active. Of course this makes the produced EXE to execute slower because there are missed function inlining oportunities.
  • The * Linker * module has a section contribution in .idata only when incremental linking is specified. Otherwise, there's no contrib section, only a debug stream.
  • There's a lot more .text padding (CCCCCC) around functions when incremental linking is active.
  • More importantly, link.exe folds .idata sections into .rdata only when /INCREMENTAL:NO is specified. Given that /INCREMENTAL is the default, I thought we should do the same in LLD. I would need to disable that behavior when providing /INCREMENTAL:NO (not in this patch yet).

Doc and defaults for /INCREMENTAL are here.

lld/trunk/COFF/Writer.cpp
844–846

Agreed. Will do that.

Actually, the question about folding .idata into .rdata remains open. Should we follow the same behavior as link.exe or not, to reduce variability in the output?

rnk added a comment.Nov 27 2018, 8:22 AM

Actually, the question about folding .idata into .rdata remains open. Should we follow the same behavior as link.exe or not, to reduce variability in the output?

I don't think we intend to support incremental linking, so I'd just fold idata into rdata by default as we do now.

aganea added a comment.EditedNov 28 2018, 7:25 AM
In D54802#1309791, @rnk wrote:

I don't think we intend to support incremental linking, so I'd just fold idata into rdata by default as we do now.

I'd say incremental linking is the only major hurdle for widespread LLD adoption on industrial sites. Like we discussed at the LLVM conf., I agree that EXEs/DLLs shouldn't be incrementally linked. But what about incremental PDB generation?

For our typical gameplay developement loop, for a DLL target, this is what we currently see:

  Input File Reading:          1847 ms (  3.7%)
  Code Layout:                  579 ms (  1.2%)
  PDB Emission (Cumulative):  43468 ms ( 87.8%)
    Add Objects:              33074 ms ( 66.8%)
      Type Merging:           28276 ms ( 57.1%)
      Symbol Merging:          4748 ms (  9.6%)
    TPI Stream Layout:         1487 ms (  3.0%)
    Globals Stream Layout:     1789 ms (  3.6%)
    Commit to Disk:            5981 ms ( 12.1%)
  Commit Output File:          2837 ms (  5.7%)
-------------------------------------------------
Total Link Time:              49494 ms (100.0%)

That is for the biggest DLL in the game, and inherently the one that programmers iterate the most on:

-- DLL: 182 MB
-- PDB: 930 MB
-- Merged OBJs: 179 (no GHASH, OBJs are from cl.exe; these are unity/blob .cpp files)
-- Inserted type records: 91,573,186
-- Final unique type records: 3,578,658 (hash table size: 8,388,608)
-- TypeTable.RecordStorage: 1,639,715,884 bytes allocated

For the same thing, link.exe /INCREMENTAL takes about 3 sec to relink any change. If we could at least do the Type/Symbol passes incrementally, that would be a major win.

aganea added a comment.EditedNov 28 2018, 7:44 AM

Sorry that was the wrong timings - that was for MSVC-built LLD. With Clang 7.0 things are a bit better in reality:

  Input File Reading:          1658 ms (  4.7%)
  Code Layout:                  621 ms (  1.8%)
  PDB Emission (Cumulative):  30380 ms ( 86.7%)
    Add Objects:              22615 ms ( 64.6%)
      Type Merging:           19205 ms ( 54.8%)
      Symbol Merging:          3385 ms (  9.7%)
    TPI Stream Layout:          897 ms (  2.6%)
    Globals Stream Layout:     1418 ms (  4.1%)
    Commit to Disk:            4559 ms ( 13.0%)
  Commit Output File:          1717 ms (  4.9%)
-------------------------------------------------
Total Link Time:              35021 ms (100.0%)

I noticed by the way that the default cmake config for Clang/LLVM/LLD comes with /INCREMENTAL and /MD (dll CRT) active. This causes a lot of thunking. I changed that to /INCREMENTAL:NO and /MT and things are a lot better - I'll send another review for that.

rnk added a comment.Nov 28 2018, 1:21 PM
  Input File Reading:          1658 ms (  4.7%)
  Code Layout:                  621 ms (  1.8%)
  PDB Emission (Cumulative):  30380 ms ( 86.7%)
    Add Objects:              22615 ms ( 64.6%)
      Type Merging:           19205 ms ( 54.8%)
      Symbol Merging:          3385 ms (  9.7%)
    TPI Stream Layout:          897 ms (  2.6%)
    Globals Stream Layout:     1418 ms (  4.1%)
    Commit to Disk:            4559 ms ( 13.0%)
  Commit Output File:          1717 ms (  4.9%)
-------------------------------------------------
Total Link Time:              35021 ms (100.0%)

I think we can optimize copying symbols using the same techniques we've used for optimizing LLD. Symbol records are pretty straightforward: You figure out which .debug$S sections are live, relocate them, process them a bit, and copy the bytes to the PDB in some order. In D54554 I did some work, and that cuts down the "Commit Output File" (13%) and "Symbol Merging" times, but there may be more things to do.

For types, I think this is one of the classic problems where you make a hash table and say "it's O(1) insertion" but really it's order "length of key", and probably with a high constant factor. I think if we looked at the distribution of type record sizes, we'd see a few categories like this:

  1. Small records, < 20bytes, i.e. less than a SHA-1, like pointer types, qualified (const/volatile) types, array types, function types, etc. Merging theses in the linker should be cheap, ghash or no ghash.
  2. Records with names, like LF_STRUCTURE and LF_PROC_ID. C++ mangled names can get long, but they probably aren't 64K, the max CV record length, bytes long. Think ~4KB.
  3. Long field lists. LF_FIELDLIST is a bear. I expect on average most field lists are 32KB plus, since they include the name of every field, method, and typedef. Templates tend to have lots of members, many of which are unused in most instantiations.

The other thing to keep in mind is that type hash table "hits" are common. Most types are duplicates. Given some 64KB field list record, you can expect it will be deduplicated O(#objects) times. Without the content hash, this means you have to memcmp the entire 64KB for every duplicate. With the hash, you do O(length of hash) work, so a 8 or 20 byte memcmp. I think that's where the real ghash gains are coming from. I'm not sure what to do with that information, but it seems like a useful insight.

One thing we talked about a while ago was trying to parallelize type merging. I think the challenge there is it requires a good concurrent hash map implementation, which we'd have to find or implement ourselves.

thakis added a subscriber: thakis.Nov 28 2018, 3:35 PM
thakis added inline comments.
lld/trunk/COFF/PDB.cpp
1586

This regresses PDBs being independent of the build directory, cf r344061

aganea marked an inline comment as done.Nov 28 2018, 4:32 PM
In D54802#1311943, @rnk wrote:

I think the challenge there is it requires a good concurrent hash map implementation, which we'd have to find or implement ourselves.

I can provide that, we have a good lockless, insert-only, DenseHash implemention that we have been using for many years.

Although the issue at stake here is the high memory bandwith, due to the large volume of data to process in a tight loop. Like you are pointing out, GHash uses a smaller key size (8 bytes) which packs more entries per cache line (instead of the {hash, ArrayRef<U8>} key (24 bytes) when GHash isn’t used). I’m not entierely sure multithreading will help a lot, but we can give it a try. Packing more tightly the data in the DenseHash, and avoiding indirections at all cost would probably benefit more in the short term.

lld/trunk/COFF/PDB.cpp
1586

Thanks for pointing out!

mstorsjo added inline comments.Feb 4 2019, 12:36 PM
lld/trunk/test/COFF/hello32.test
60 ↗(On Diff #174923)

I didn't read the whole patch in detail, but can you give a TL;DR about what's added to the output files in the case when no PDB output is enabled, that shuffles some sections forward now?

aganea planned changes to this revision.Feb 4 2019, 12:47 PM
aganea marked an inline comment as done.

I still need to work on this patch. I'll mark it as pending in the meanwhile.

lld/trunk/test/COFF/hello32.test
60 ↗(On Diff #174923)

Please see lld/trunk/COFF/Driver.cpp, L1226, I initially removed parseMerge(".idata=.rdata");

And the comment above:
"More importantly, link.exe folds .idata sections into .rdata only when /INCREMENTAL:NO is specified. Given that /INCREMENTAL is the default, I thought we should do the same in LLD. I would need to disable that behavior when providing /INCREMENTAL:NO (not in this patch yet)."

However, it was decided that we'll keep the current .idata merging behavior for now.

mstorsjo added inline comments.Feb 4 2019, 12:50 PM
lld/trunk/test/COFF/hello32.test
60 ↗(On Diff #174923)

Ok, I see. Thanks!

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 7:08 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
aganea updated this revision to Diff 190888.EditedMar 15 2019, 1:32 PM
aganea marked 6 inline comments as done.

Simplify & addressed comments.

I took the liberty to remove COFF groups from the MinGW target. This was adding a significant footprint to the PDB, given that each function is in its own .text section.
As an additionnal side-effect, MinGW PDBs are now much smaller, because we don't emit empty debug streams anymore.

As an example, using Martin's test package, I generated libqt_plugin.pdb:

trunk21 MB
coff groupsimport modules27.1 MB
(this patch)no coff groupsimport modules13.5 MB
no coff groupsno import modules13.3 MB

The impact is less significat in non-MinGW, on a 1 GB PDB this patch adds 2 MB and scales with the size (actually depends on the number of imported libraries).

aganea edited the summary of this revision. (Show Details)Mar 15 2019, 1:35 PM
aganea edited the summary of this revision. (Show Details)Mar 15 2019, 1:40 PM
rnk added inline comments.Mar 15 2019, 2:10 PM
lld/trunk/COFF/PDB.cpp
1109–1110

Is this relevant to this change?

1577

DllModSet, maybe? DllToModuleDbi?

1595–1597

Usually we use Twine to avoid the cost of concatenating strings used for debugging purposes, like names for instructions that aren't used in release mode, or log statements. In this case, I think you can use plain old std::string, something like addModuleInfo(std::string("Import:") + File->DLLName. With that simplification, I think the lambda is unnecessary.

llvm/trunk/lib/DebugInfo/PDB/Native/ModuleDebugStream.cpp
40–50 ↗(On Diff #190888)

I don't see the change to add this method to the class in this patch. Was this change intended to be part of this review? It seems unnecessary for import files.

aganea marked 3 inline comments as done.Mar 15 2019, 2:21 PM
aganea added inline comments.
lld/trunk/COFF/PDB.cpp
1109–1110

Yes, I think we can remove that comment now, unless @zturner says otherwise.

llvm/trunk/lib/DebugInfo/PDB/Native/ModuleDebugStream.cpp
40–50 ↗(On Diff #190888)

I indeed forgot an include file from the diff. But you're right, this is for supporting empty streams, I should maybe do that in another patch.

aganea updated this revision to Diff 191490.Mar 20 2019, 7:39 AM
aganea marked 4 inline comments as done.
aganea edited the summary of this revision. (Show Details)

Rebased, reduced & adressed comments.

Anything missing for this to be accepted? It seems like this patch is ready.

rnk accepted this revision.Mar 26 2019, 12:56 PM

lgtm, thanks!

Anything missing for this to be accepted? It seems like this patch is ready.

I just missed the update, sorry.

COFF/Writer.cpp
307 ↗(On Diff #191490)

Needs wrapping

This revision is now accepted and ready to land.Mar 26 2019, 12:56 PM
This revision was automatically updated to reflect the committed changes.
aganea marked an inline comment as done.