This is just a proof-of-concept, just to demonstrate generating GHASHes at link-time (in parallel).
In my test, all the source OBJs are from MSVC, so there's no prior GHASH stream.
All in all, things are about 5sec faster (in my large DLL test), even if we're generating GHASHes.
The end-gain comes also from the Types/IDs hash tables, which are much faster with GHASHes
I've also optimized the Type hash table (through the new GlobalTypeDenseMap class) by making the buckets smaller (8 bytes vs. 12 bytes for regular GHASH). This makes merging about 35% faster.
I've also thown in parallel sorting the globals stream which makes this pass 2x faster.
Before patch, with regular Type merging:
------------------------------------------------- Input File Reading: 1658 ms ( 4.7%) Code Layout: 621 ms ( 1.8%) PDB Emission (Cumulative): 30380 ms ( 86.7%) Add Objects: 22615 ms ( 64.6%) Type Merging: 19205 ms ( 54.8%) Symbol Merging: 3385 ms ( 9.7%) TPI Stream Layout: 897 ms ( 2.6%) Globals Stream Layout: 1418 ms ( 4.1%) Commit to Disk: 4559 ms ( 13.0%) Commit Output File: 1717 ms ( 4.9%) ------------------------------------------------- Total Link Time: 35021 ms (100.0%)
With this patch, GHASH-only merging:
------------------------------------------------ Input File Reading: 1647 ms ( 5.4%) Code Layout: 576 ms ( 1.9%) PDB Emission (Cumulative): 27537 ms ( 89.6%) Add Objects: 21088 ms ( 68.6%) Global hashing: 10723 ms ( 34.9%) <<<< parallel Type Merging: 7419 ms ( 24.1%) <<<< 12-byte buckets Symbol Merging: 2861 ms ( 9.3%) TPI Stream Layout: 941 ms ( 3.1%) Globals Stream Layout: 1545 ms ( 5.0%) <<<< no parallel Commit to Disk: 3184 ms ( 10.4%) Commit Output File: 353 ms ( 1.1%) ------------------------------------------------ Total Link Time: 30728 ms (100.0%)
With this patch, GHASH-only merging:
Input File Reading: 1620 ms ( 5.6%) Code Layout: 598 ms ( 2.1%) PDB Emission (Cumulative): 23715 ms ( 81.6%) Add Objects: 17933 ms ( 61.7%) Global hashing: 9734 ms ( 33.5%) <<<< parallel Type Merging: 5293 ms ( 18.2%) <<<< 8-byte buckets Symbol Merging: 2823 ms ( 9.7%) TPI Stream Layout: 900 ms ( 3.1%) Globals Stream Layout: 953 ms ( 3.3%) <<<< parallel Commit to Disk: 3161 ms ( 10.9%) Commit Output File: 2512 ms ( 8.6%) ------------------------------------------------- Total Link Time: 29067 ms (100.0%)
Sorry for the messy patch, I am just looking for overall advice. If this is the right direction, I'll split down the patch in smaller pieces.
What is the point of making it selectable to users? It feels to me that you should pick up the one that you think the best and just use it.