This is an archive of the discontinued LLVM Phabricator instance.

Only call FlushFileBuffers when writing executables
ClosedPublic

Authored by aganea on Oct 25 2018, 1:10 PM.

Details

Summary

This is a follow-up for D42925 (Call FlushFileBuffers on readwrite file mappings)

Previously, FlushFileBuffers was called in all cases when writing a file. The objective was to go around a bug in the Windows kernel, as described here. However that is required only when writing EXEs, any other file types don't need flushing.

Recently we ran into a performance issue with LLD: on older SSDs, FlushFileBuffers was causing very slow link times. For the record, the PDB is 1.5 GB, the EXE is 450 MB.

  Input File Reading:          3314 ms (  4.2%)
  Code Layout:                 1113 ms (  1.4%)
  PDB Emission (Cumulative):  61715 ms ( 78.0%)
    Add Objects:              10636 ms ( 13.4%)
      Type Merging:            4896 ms (  6.2%)
      Symbol Merging:          4780 ms (  6.0%)
    TPI Stream Layout:          960 ms (  1.2%)
    Globals Stream Layout:     3353 ms (  4.2%)
    Commit to Disk:           44875 ms ( 56.7%)     <<<<<<< writing the PDB
  Commit Output File:         12526 ms ( 15.8%)
-------------------------------------------------
Total Link Time:              79094 ms (100.0%)

This was caused by our SSD brand, which has known performance issues when the disk becomes close to full. At that point, the SSD's write speed goes down from 180 MB/s to <30 MB/s.


With this patch, FlushFileBuffers is now called only for EXEs. I've conveniently changed the mapped_file_region constructor to take an extra parameter, to force clients to acknowledge the change.

After the change, figures became a bit better (the test was ran in the same conditions):

  Input File Reading:          6754 ms ( 15.7%)
  Code Layout:                 1851 ms (  4.3%)
  PDB Emission (Cumulative):  28355 ms ( 65.7%)
    Add Objects:              14070 ms ( 32.6%)
      Type Merging:            6290 ms ( 14.6%)
      Symbol Merging:          6862 ms ( 15.9%)
    TPI Stream Layout:          994 ms (  2.3%)
    Globals Stream Layout:     3796 ms (  8.8%)
    Commit to Disk:            7620 ms ( 17.7%)     <<<<<<< writing the PDB
  Commit Output File:          5667 ms ( 13.1%)
-------------------------------------------------
Total Link Time:              43149 ms (100.0%)

I've tried several runs back to back, with and without the current patch, to ensure the slowdown was still there.

Evidently, you would still pay the price for writing the PDB to disk, however that operation is now queued at the OS level and the build script can carry on with other tasks.

Also please note that newer SSD brands are unaffected by this. The performance is the same with or without the patch when using a Samsung SSD on a NVMe interface:

  Input File Reading:          3656 ms ( 11.1%)
  Code Layout:                 1479 ms (  4.5%)
  PDB Emission (Cumulative):  27071 ms ( 82.4%)
    Add Objects:              11794 ms ( 35.9%)
      Type Merging:            5685 ms ( 17.3%)
      Symbol Merging:          5093 ms ( 15.5%)
    TPI Stream Layout:          981 ms (  3.0%)
    Globals Stream Layout:     3457 ms ( 10.5%)
    Commit to Disk:            8449 ms ( 25.7%)
  Commit Output File:           328 ms (  1.0%)
-------------------------------------------------
Total Link Time:              32849 ms (100.0%)

Diff Detail

Repository
rL LLVM

Event Timeline

aganea created this revision.Oct 25 2018, 1:10 PM

Bruce Dawson also tells me that Microsoft has fixed this bug in Windows 10 RS5/1810. Can we additionally disable this workaround entirely behind a runtime OS version check?

Bruce Dawson also tells me that Microsoft has fixed this bug in Windows 10 RS5/1810. Can we additionally disable this workaround entirely behind a runtime OS version check?

Looks like there's no 1810 yet. Redstone 5 is 1809 build 17763.55, as described here. I'll wait for Bruce's answer before adding the test.

ruiu added a subscriber: ruiu.Oct 25 2018, 2:04 PM

Can I ask what was the reason to call FlushFileBuffer on file close in the first place? My understanding is that, if the last open file handler is closed, anything that was written to the file was automatically flushed at the moment. Or, it is not the case?

Can I ask what was the reason to call FlushFileBuffer on file close in the first place? My understanding is that, if the last open file handler is closed, anything that was written to the file was automatically flushed at the moment. Or, it is not the case?

See the patch description here: https://reviews.llvm.org/D42925

ruiu added a comment.Oct 25 2018, 2:13 PM

Looks like there's a way to get a filename from a file handle on Windows (https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-getfileinformationbyhandleex).

If you use this, you can check if a file name ends with ".exe", so you don't need to pass "is executable" bit around, I guess?

Looks like there's a way to get a filename from a file handle on Windows (https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-getfileinformationbyhandleex).

If you use this, you can check if a file name ends with ".exe", so you don't need to pass "is executable" bit around, I guess?

Theoretically you can have executables that don't end in in .exe, but I guess this probably covers 99% of cases. Maybe it's ok, I'm not sure.

aganea updated this revision to Diff 171330.Oct 26 2018, 12:23 PM

What about we check if the first mmap'ed page starts with the PE signature? This leaves no ambiguity. Please see the updated diff.

I ripped the PE compare code from LLVMBinaryFormat otherwise referencing it would add an extra dependency when using LLVMSupport. Is there a better way?

I also added GetWindowsOSVersion() which I've also ripped from lldb/trunk/source/Host/windows/HostInfoWindows.cpp. However I've added proper detection for Windows 10 on the top, as GetVersionEx is deprecated on Windows 10.

rnk added inline comments.Oct 31 2018, 3:08 PM
lib/Support/Windows/Path.inc
857–859 ↗(On Diff #171330)

Rather than having two static locals which both emit guard variable initialization, I would recommend making a static local bool that simply is the return value.

You could probably do this:

static bool Res = *(GetWindowsOSVersion() < llvm::VersionTuple(10, 0, 0, 17763));
lib/Support/Windows/WindowsSupport.h
78–80 ↗(On Diff #171330)

I was going to say, surely GetVersionInfoEx will give us this info, but now I see the mess that is versionhelpers.h, and I assume you tried that, and they don't expose the build number.

aganea updated this revision to Diff 172433.Nov 2 2018, 2:22 PM
aganea marked an inline comment as done.Nov 2 2018, 2:27 PM
aganea added inline comments.
lib/Support/Windows/WindowsSupport.h
78–80 ↗(On Diff #171330)

In addition to GetVersion and GetVersionEx being deprecated, applications now need to be manifested in order to target Windows 10. If the application don't provide a manifest, the version helpers functions will report Windows 8. In this case we want the actual version at the OS-level, not at the application-level. RtlGetVersion is the only way of doing that correctly.

rnk accepted this revision.Nov 2 2018, 2:27 PM

Looks good, but when you commit it, you probably will want to keep an eye out for warnings or build breakages in other configurations. This code will be compiled with GCC, MSVC, and clang, and I can imagine a few ways the WindowsSupport.h code could raise warnings.

lib/Support/Windows/WindowsSupport.h
76 ↗(On Diff #172433)

Does this really not conflict with the windows.h include above?

95 ↗(On Diff #172433)

Oh, do these need to be #ifdef _MSC_VER? Will we get the deprecation warning in a clang-cl build?

This revision is now accepted and ready to land.Nov 2 2018, 2:27 PM
aganea added a comment.Nov 2 2018, 2:42 PM
In D53727#1285988, @rnk wrote:

Looks good, but when you commit it, you probably will want to keep an eye out for warnings or build breakages in other configurations. This code will be compiled with GCC, MSVC, and clang, and I can imagine a few ways the WindowsSupport.h code could raise warnings.

Very good point. I'll try different permutations with different Windows Kits and different compilers to ensure there are no issues.

lib/Support/Windows/WindowsSupport.h
76 ↗(On Diff #172433)

Actually, no: (on my machine,)

  • windows.h in is C:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\um\
  • ntstatus.h is in C:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\shared\

Nobody includes ntstatus.h anywhere, and STATUS_SUCCESS isn't used either.

95 ↗(On Diff #172433)

_MSC_VER is the compiler version, however the deprecation happens in the Windows Kits, which are actually separated from the compiler.

(C:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\um\sysinfoapi.h)
NOT_BUILD_WINDOWS_DEPRECATE
WINBASEAPI
__drv_preferredFunction("IsWindows*", "Deprecated. Use VerifyVersionInfo* or IsWindows* macros from VersionHelpers.")
BOOL
WINAPI
GetVersionExW(
    _Inout_ LPOSVERSIONINFOW lpVersionInformation
    );

Currently when compiling LLVM/Clang on Windows, we're using whichever Windows Kit is set as "default" in the environment variables:

WindowsSdkBinPath=C:\Program Files (x86)\Windows Kits\10\bin\
WindowsSdkDir=C:\Program Files (x86)\Windows Kits\10\
WindowsSDKLibVersion=10.0.17134.0\
WindowsSdkVerBinPath=C:\Program Files (x86)\Windows Kits\10\bin\10.0.17134.0\
WindowsSDKVersion=10.0.17134.0\
This revision was automatically updated to reflect the committed changes.