This is a follow-up for D42925 (Call FlushFileBuffers on readwrite file mappings)
Previously, FlushFileBuffers was called in all cases when writing a file. The objective was to go around a bug in the Windows kernel, as described here. However that is required only when writing EXEs, any other file types don't need flushing.
Recently we ran into a performance issue with LLD: on older SSDs, FlushFileBuffers was causing very slow link times. For the record, the PDB is 1.5 GB, the EXE is 450 MB.
Input File Reading: 3314 ms ( 4.2%) Code Layout: 1113 ms ( 1.4%) PDB Emission (Cumulative): 61715 ms ( 78.0%) Add Objects: 10636 ms ( 13.4%) Type Merging: 4896 ms ( 6.2%) Symbol Merging: 4780 ms ( 6.0%) TPI Stream Layout: 960 ms ( 1.2%) Globals Stream Layout: 3353 ms ( 4.2%) Commit to Disk: 44875 ms ( 56.7%) <<<<<<< writing the PDB Commit Output File: 12526 ms ( 15.8%) ------------------------------------------------- Total Link Time: 79094 ms (100.0%)
This was caused by our SSD brand, which has known performance issues when the disk becomes close to full. At that point, the SSD's write speed goes down from 180 MB/s to <30 MB/s.
With this patch, FlushFileBuffers is now called only for EXEs. I've conveniently changed the mapped_file_region constructor to take an extra parameter, to force clients to acknowledge the change.
After the change, figures became a bit better (the test was ran in the same conditions):
Input File Reading: 6754 ms ( 15.7%) Code Layout: 1851 ms ( 4.3%) PDB Emission (Cumulative): 28355 ms ( 65.7%) Add Objects: 14070 ms ( 32.6%) Type Merging: 6290 ms ( 14.6%) Symbol Merging: 6862 ms ( 15.9%) TPI Stream Layout: 994 ms ( 2.3%) Globals Stream Layout: 3796 ms ( 8.8%) Commit to Disk: 7620 ms ( 17.7%) <<<<<<< writing the PDB Commit Output File: 5667 ms ( 13.1%) ------------------------------------------------- Total Link Time: 43149 ms (100.0%)
I've tried several runs back to back, with and without the current patch, to ensure the slowdown was still there.
Evidently, you would still pay the price for writing the PDB to disk, however that operation is now queued at the OS level and the build script can carry on with other tasks.
Also please note that newer SSD brands are unaffected by this. The performance is the same with or without the patch when using a Samsung SSD on a NVMe interface:
Input File Reading: 3656 ms ( 11.1%) Code Layout: 1479 ms ( 4.5%) PDB Emission (Cumulative): 27071 ms ( 82.4%) Add Objects: 11794 ms ( 35.9%) Type Merging: 5685 ms ( 17.3%) Symbol Merging: 5093 ms ( 15.5%) TPI Stream Layout: 981 ms ( 3.0%) Globals Stream Layout: 3457 ms ( 10.5%) Commit to Disk: 8449 ms ( 25.7%) Commit Output File: 328 ms ( 1.0%) ------------------------------------------------- Total Link Time: 32849 ms (100.0%)
Rather than having two static locals which both emit guard variable initialization, I would recommend making a static local bool that simply is the return value.
You could probably do this: