Page MenuHomePhabricator

[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP
ClosedPublic

Authored by tejohnson on Jan 22 2020, 4:21 PM.

Details

Summary

Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.

This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).

This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.

Depends on D71913.

Diff Detail

Event Timeline

tejohnson created this revision.Jan 22 2020, 4:21 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 22 2020, 4:21 PM
tejohnson marked an inline comment as done.Jan 22 2020, 4:30 PM
tejohnson added inline comments.
llvm/test/Transforms/WholeProgramDevirt/uniform-retval.ll
28

This change is to distinguish the indirect call here that should have been removed from the @llvm.type.test and @llvm.assume calls that are no longer removed at this point.

This is an enabler for upcoming enhancements to indirect call promotion, for example streamlined promotion guard sequences that compare against vtable address instead of the target function

Can you please describe the whole approach in more detail? At the moment ICP is capable to do (a sort of) devirtualization is replacing indirect vtbl call with sequence of function address comparisons and direct calls.
Are you going to speedup this by means of comparing vtable pointers instead of function pointers (thus eliminating a single load per each vtbl call) or there is also something else in mind? If that's true, what's the next
step? Make ICP pass analyze type test intrinsics?

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1695

This change seems to be unrelated

tejohnson marked an inline comment as done.Thu, Jan 30, 6:45 AM

This is an enabler for upcoming enhancements to indirect call promotion, for example streamlined promotion guard sequences that compare against vtable address instead of the target function

Can you please describe the whole approach in more detail? At the moment ICP is capable to do (a sort of) devirtualization is replacing indirect vtbl call with sequence of function address comparisons and direct calls.
Are you going to speedup this by means of comparing vtable pointers instead of function pointers (thus eliminating a single load per each vtbl call) or there is also something else in mind?

That's exactly what we want to do here. We found a relatively significant number of cycles are being spent on virtual function pointer loads in these sequences, and by doing a vtable comparison instead, that is moved off the critical path. I had prototyped something like this in ICP awhile back and found a speedup in an important app.

If that's true, what's the next
step? Make ICP pass analyze type test intrinsics?

There are a few ways to do the alternate ICP compare sequences, one is using statically available info from the vtable definitions in the module that utilize the profiled target. This relies on ThinLTO to import all the relevant vtable definitions. The other is to profile vtable addresses with FDO (not just the target function pointer) - I've got the type profiling implemented, but it needs some cleanup before I send for review. Both of these approaches need the type tests to determine the correct address point offset (the offset in the type test) to use in the compare sequence. And in both cases you want to trade off the number of comparisons needed for the two approaches to determine whether a vtable compare or a target function compare is better. I.e. if there are a lot of vtable definitions that utilize a hot target, it is likely better to do a single target function comparison.

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1695

It is needed to have the TypeId available outside this if statement (see the map check below).

Both of these approaches need the type tests to determine the correct address point offset (the offset in the type test) to use in the compare sequence.

I typed this too fast - the offset is in the type metadata, but we need the type tests to correlate them with the indirect call sites.

This patch is to be rebased against D73094

llvm/include/llvm/IR/ModuleSummaryIndex.h
830

I don't think such implicit conversion of Unsat to Unknown is good idea. I guess problem will arise for legacy index files: for those ByteArray resolution will become Unsat and so on. I suggest moving Unknown the end of the list and bumping index version respectively (parseTypeIdSummaryRecord doesn't verify TheKind).

llvm/lib/Passes/PassBuilder.cpp
1381

Can we get rid of two identical passes here (and in other places also)? For instance

MPM.addPass(LowerTypeTestsPass(ExportSummary, nullptr, true));

can both lower type metadata and do final cleanup.

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

I don't think, I understand this.
It looks like that if (a) we have type.test in the module and (b) we don't have vtable definition with corresponding type metadata in the same module then we'll remove type.test/assume sequence before even getting to ICP. This seems strange in the context of previous discussion because virtual function may be called in different module from one where vtable is defined, like so:

struct A { virtual int foo() { return 42; } };

// calls pA->foo
extern int callFoo(A *pA);
int main() { A a; return callFoo(&a); }
tejohnson marked 3 inline comments as done.Mon, Feb 3, 11:48 AM
tejohnson added inline comments.
llvm/include/llvm/IR/ModuleSummaryIndex.h
830

Sounds like a good idea, will do.

llvm/lib/Passes/PassBuilder.cpp
1381

The problem is that currently under DropTypeTests=true, LTT will drop the type tests up front and return without doing any other lowering. Which is what we want in the already existing callers of LTT with DropTypeTests=true. Here, we want LTT to perform its usual lowering, and only afterwards do the dropping. We want to do this only during regular LTO, but unfortunately we can't just key off of the ExportSummary (e.g. drop at the start if ExportSummary is nullptr, and drop at the end if ExportSummary is non-nullptr), since in some cases here ExportSummary may be nullptr (i.e. regular LTO invoked via the old LTO API). So I would need to introduce another option to essentially say "drop them after lowering". I can certainly do this if you think it would be better (I was thinking about that before, but thought it might be more confusing). WDYT?

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

We will only be able to do ICP this way if we have the target vtable in the module (similar to how we currently can only do ICP if the target function is in the module). I anticipate doing something similar for vtable defs as to what we do for virtual functions, where a fake call edge is added to the summary based on the value profile results to ensure they are imported before the LTO backend ICP.

evgeny777 added inline comments.Tue, Feb 4, 5:19 AM
llvm/lib/Passes/PassBuilder.cpp
1381

So I would need to introduce another option to essentially say "drop them after lowering". I can certainly do this if you think it would be better (I was thinking about that before, but thought it might be more confusing). WDYT?

Ok, I see. I don't think having this another option is significantly better. So let's leave it as is for now

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

We will only be able to do ICP this way if we have the target vtable in the module (similar to how we currently can only do ICP if the target function is in the module).

I was thinking of slightly different approach: it seems you have required type information in combined index together with type name in the module, so you probably can add external declarations of required vtables (this may require promotion) to the module in the ICP pass and run ICP over them. Do you think this can work?

tejohnson marked an inline comment as done.Tue, Feb 4, 6:23 AM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

Possibly, but we'd still need to have type metadata on those vtable declarations, because the type metadata provides the address point offset which is needed in the comparison sequence.

evgeny777 added inline comments.Tue, Feb 4, 8:06 AM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

Possibly, but we'd still need to have type metadata on those vtable declarations, because the type metadata provides the address point offset which is needed in the comparison sequence.

I think it's stored in the index in VFuncId structures. Isn't it?

tejohnson marked an inline comment as done.Tue, Feb 4, 10:36 AM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

I think it's stored in the index in VFuncId structures. Isn't it?

No, that holds the offset from the address point to the virtual function within the vtable def, not the address point offset itself, which is what we need from the type metadata. But actually, we need both offsets:

Consider the following example:

class A {
   virtual void foo();
}

void bar(A *a) {
   a->foo();
}

The indirect call sequence will look like (not showing the type test for simplicity):

%0 = bitcast %class.A* %a to i32 (%class.A*)***
 %vtable = load i32 (%class.A*)**, i32 (%class.A*)*** %0
 %1 = load i32 (%class.A*)*, i32 (%class.A*)** %vtable
 %call = tail call i32 %1(%class.A* %a)

With type profiling we will profile the value of %vtable, and figure out that it is associated with the symbol for A's vtable (the profiling support uses symbol table info for this), which is:

@_ZTV1A = dso_local unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (i32 (%class.A*)* @_ZN1A3fooEv to i8*)] }, align 8, !type !0

!0 = !{i64 16, !"_ZTS1A"}

When we promote the call using the vtable profile results, we need to add the address point offset (16) from the type metadata to the profiled result which will be the symbol @_ZTV1A, before comparing against %vtable.

We then need to look at the IR and compute the offset to the function address used there (0 in the above IR), and add it to the address point offset, resulting in 16 here, looking in the vtable def to see what function is at the combined offset (_ZN1A3fooEv here), so that we insert a direct call to that function.

For the address point offset, we only have it in the summary for index-only WPD (TypeIdCompatibleVtableMap). However, I don't want to restrict the ICP improvements to that build mode.

For the latter, the VFuncId does summarize the offsets, but we can compute it from the IR as I described above (it's the amount added to %vtable before loading the virtual function pointer) and don't need the summary. And in any case the VFuncId doesn't tell us which function is at the aggregate offset, we need the vtable def to tell us that (or the vTableFuncs array saved on the GlobalVarSummary, again only in index-only WPD mode).

Also, I believe we can support this type profiling based ICP in non-ThinLTO mode as well - with my recent changes to make WPD rely on vcall visibility info, we should be able to insert both the type tests and the type metadata into the code stream in non-ThinLTO mode. In that case, similar to the way ICP currently behaves for target functions, the def would have to happen to already be in the current module for the vtable compare promotion to be occur.

For the above reasons, I'd prefer to rely on the type metadata on vtable defs, with the defs imported as much as possible in ThinLTO mode.

tejohnson updated this revision to Diff 242493.Tue, Feb 4, 6:33 PM

Rebase and implement suggestion (move Unknown to end and bump index version)

evgeny777 added inline comments.Wed, Feb 5, 12:26 AM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

Okay, got it. One question: why do you insist on removing such type.test/assume sequences here, instead of simply ignoring them in ICP and eliminating them altogether in LowerTypeTests?

tejohnson marked an inline comment as done.Wed, Feb 5, 6:26 AM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1713

ICP isn't the issue, it is what will happen in LTT. Without any associated type metadata, as noted in my comment here, LTT will think they are Unsat and lower to false, which is incorrect and results in an llvm.assume(false) which will turn into an unreachable, resulting in runtime failures.

This revision is now accepted and ready to land.Wed, Feb 5, 7:16 AM
This revision was automatically updated to reflect the committed changes.

This makes lld crash when linking chromium's base_unittests and probably does the same for most other binaries that use both thinlto and cfi: https://bugs.chromium.org/p/chromium/issues/detail?id=1049434

This makes lld crash when linking chromium's base_unittests and probably does the same for most other binaries that use both thinlto and cfi: https://bugs.chromium.org/p/chromium/issues/detail?id=1049434

Reverted at 25aa2eef993e17708889abf56ed1ffad5074a9f4. Will investigate using repro @thakis sent off patch.

tejohnson marked an inline comment as done.Tue, Feb 11, 10:54 AM

This makes lld crash when linking chromium's base_unittests and probably does the same for most other binaries that use both thinlto and cfi: https://bugs.chromium.org/p/chromium/issues/detail?id=1049434

Reverted at 25aa2eef993e17708889abf56ed1ffad5074a9f4. Will investigate using repro @thakis sent off patch.

Recommitted with fix and additional test case at 80d0a137a5aba6998fadb764f1e11cb901aae233.

llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1999

The fix needed for the Chromium issue was to guard the TypeIdSummary creation here by whether the TypeID exists in the TypeIdMap (which makes it match the comment in fact), as we don't want to create a summary if the type id is not used on a global (in which case it should in fact be Unsat). The equivalent code in the index-only WPD is essentially already guarded by that condition, because of the way the CallSlots are created (and in fact there is an assert in that code that we have a use on a vtable, i.e. that a TypeIdCompatibleVtableSummary is found for the TypeID).

aganea added a subscriber: aganea.Tue, Feb 18, 6:44 PM

There seems to be still an issue with this patch, when linking the LLVM unit tests on a two-stage build, it ends with:

FAILED: tools/clang/unittests/StaticAnalyzer/StaticAnalysisTests.exe
cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_exe --intdir=tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100183~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100183~1.0\x64\mt.exe --manifests  -- D:\llvm-project\buildninjaRel\bin\lld-link.exe /nologo tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\AnalyzerOptionsTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\CallDescriptionTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\StoreTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\RegisterCustomCheckersTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\SymbolReaperTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\D_\llvm-project\llvm\resources\windows_version_resource.rc.res  /out:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.exe /implib:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.lib /pdb:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.pdb /version:0.0  /machine:x64 -fuse-ld=lld /STACK:10000000 /DEBUG /OPT:REF /OPT:ICF /lldltocache:D:/llvm-project/buildninjaRelMiMalloc/lto.cache /INCREMENTAL:NO /subsystem:console  lib\LLVMSupport.lib  lib\LLVMSupport.lib  lib\gtest_main.lib  lib\gtest.lib  lib\clangBasic.lib  lib\clangAnalysis.lib  lib\clangAST.lib  lib\clangASTMatchers.lib  lib\clangCrossTU.lib  lib\clangFrontend.lib  lib\clangSerialization.lib  lib\clangStaticAnalyzerCore.lib  lib\clangStaticAnalyzerFrontend.lib  lib\clangTooling.lib  lib\clangStaticAnalyzerCheckers.lib  lib\clangStaticAnalyzerCore.lib  lib\clangCrossTU.lib  lib\clangIndex.lib  lib\clangFrontend.lib  lib\clangParse.lib  lib\clangSerialization.lib  lib\clangSema.lib  lib\clangAnalysis.lib  lib\clangASTMatchers.lib  lib\clangEdit.lib  lib\clangDriver.lib  version.lib  lib\LLVMOption.lib  lib\clangFormat.lib  lib\clangToolingInclusions.lib  lib\clangToolingCore.lib  lib\clangAST.lib  lib\LLVMFrontendOpenMP.lib  lib\LLVMTransformUtils.lib  lib\LLVMAnalysis.lib  lib\LLVMProfileData.lib  lib\LLVMObject.lib  lib\LLVMBitReader.lib  lib\LLVMMCParser.lib  lib\LLVMTextAPI.lib  lib\clangRewrite.lib  lib\clangLex.lib  lib\clangBasic.lib  lib\LLVMCore.lib  lib\LLVMRemarks.lib  lib\LLVMBitstreamReader.lib  lib\LLVMMC.lib  lib\LLVMBinaryFormat.lib  lib\LLVMDebugInfoCodeView.lib  lib\LLVMDebugInfoMSF.lib  lib\LLVMSupport.lib  psapi.lib  shell32.lib  ole32.lib  uuid.lib  advapi32.lib  delayimp.lib  -delayload:shell32.dll  -delayload:ole32.dll  lib\LLVMDemangle.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK: command "D:\llvm-project\buildninjaRel\bin\lld-link.exe /nologo tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\AnalyzerOptionsTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\CallDescriptionTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\StoreTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\RegisterCustomCheckersTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\SymbolReaperTest.cpp.obj tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\D_\llvm-project\llvm\resources\windows_version_resource.rc.res /out:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.exe /implib:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.lib /pdb:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.pdb /version:0.0 /machine:x64 -fuse-ld=lld /STACK:10000000 /DEBUG /OPT:REF /OPT:ICF /lldltocache:D:/llvm-project/buildninjaRelMiMalloc/lto.cache /INCREMENTAL:NO /subsystem:console lib\LLVMSupport.lib lib\LLVMSupport.lib lib\gtest_main.lib lib\gtest.lib lib\clangBasic.lib lib\clangAnalysis.lib lib\clangAST.lib lib\clangASTMatchers.lib lib\clangCrossTU.lib lib\clangFrontend.lib lib\clangSerialization.lib lib\clangStaticAnalyzerCore.lib lib\clangStaticAnalyzerFrontend.lib lib\clangTooling.lib lib\clangStaticAnalyzerCheckers.lib lib\clangStaticAnalyzerCore.lib lib\clangCrossTU.lib lib\clangIndex.lib lib\clangFrontend.lib lib\clangParse.lib lib\clangSerialization.lib lib\clangSema.lib lib\clangAnalysis.lib lib\clangASTMatchers.lib lib\clangEdit.lib lib\clangDriver.lib version.lib lib\LLVMOption.lib lib\clangFormat.lib lib\clangToolingInclusions.lib lib\clangToolingCore.lib lib\clangAST.lib lib\LLVMFrontendOpenMP.lib lib\LLVMTransformUtils.lib lib\LLVMAnalysis.lib lib\LLVMProfileData.lib lib\LLVMObject.lib lib\LLVMBitReader.lib lib\LLVMMCParser.lib lib\LLVMTextAPI.lib lib\clangRewrite.lib lib\clangLex.lib lib\clangBasic.lib lib\LLVMCore.lib lib\LLVMRemarks.lib lib\LLVMBitstreamReader.lib lib\LLVMMC.lib lib\LLVMBinaryFormat.lib lib\LLVMDebugInfoCodeView.lib lib\LLVMDebugInfoMSF.lib lib\LLVMSupport.lib psapi.lib shell32.lib ole32.lib uuid.lib advapi32.lib delayimp.lib -delayload:shell32.dll -delayload:ole32.dll lib\LLVMDemangle.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.exe.manifest" failed (exit code 0) with the following output:
lld-link: warning: ignoring unknown argument '-fuse-ld=lld'
LLVM ERROR: Second argument of llvm.type.test must be a metadata string
Stack dump:
0.      Running pass 'Lower type metadata' on module 'tools\clang\unittests\StaticAnalyzer\CMakeFiles\StaticAnalysisTests.dir\RegisterCustomCheckersTest.cpp.obj'.
 #0 0x00007ff62cc8f056 HandleAbort D:\llvm-project\llvm\lib\Support\Windows\Signals.inc:408:0
 #1 0x00007ff62f6912f1 raise D:\llvm-project\buildninjaRel\minkernel\crts\ucrt\src\appcrt\misc\signal.cpp:547:0
 #2 0x00007ff62f687c08 abort D:\llvm-project\buildninjaRel\minkernel\crts\ucrt\src\appcrt\startup\abort.cpp:71:0
 #3 0x00007ff62cc88e98 llvm::report_fatal_error(class llvm::Twine const &, bool) D:\llvm-project\llvm\lib\Support\ErrorHandling.cpp:126:0
 #4 0x00007ff62cc88cb1 llvm::report_fatal_error(char const *, bool) D:\llvm-project\llvm\lib\Support\ErrorHandling.cpp:83:0
 #5 0x00007ff62ec3db75 `anonymous namespace'::LowerTypeTestsModule::lower D:\llvm-project\llvm\lib\Transforms\IPO\LowerTypeTests.cpp:0:0
 #6 0x00007ff62ec3e0ce `anonymous namespace'::LowerTypeTests::runOnModule D:\llvm-project\llvm\lib\Transforms\IPO\LowerTypeTests.cpp:539:0
 #7 0x00007ff62d3d6f82 llvm::legacy::PassManagerImpl::run(class llvm::Module &) D:\llvm-project\llvm\lib\IR\LegacyPassManager.cpp:1695:0
 #8 0x00007ff62d6350c0 `anonymous namespace'::opt D:\llvm-project\llvm\lib\LTO\LTOBackend.cpp:314:0
 #9 0x00007ff62d637a48 llvm::lto::thinBackend(struct llvm::lto::Config const &, unsigned int, class std::function<(unsigned int)>, class llvm::Module &, class llvm::ModuleSummaryIndex const &, class llvm::StringMap<class std::unordered_set<unsigned __int64, struct std::hash<unsigned __int64>, struct std::equal_to<unsigned __int64>, class std::allocator<unsigned __int64>>, class llvm::MallocAllocator> const &, class llvm::DenseMap<unsigned __int64, class llvm::GlobalValueSummary *, struct llvm::DenseMapInfo<unsigned __int64>, struct llvm::detail::DenseMapPair<unsigned __int64, class llvm::GlobalValueSummary *>> const &, class llvm::MapVector<class llvm::StringRef, class llvm::BitcodeModule, class llvm::DenseMap<class llvm::StringRef, unsigned int, struct llvm::DenseMapInfo<class llvm::StringRef>, struct llvm::detail::DenseMapPair<class llvm::StringRef, unsigned int>>, class std::vector<struct std::pair<class llvm::StringRef, class llvm::BitcodeModule>, class std::allocator<struct std::pair<class llvm::StringRef, class llvm::BitcodeModule>>>> &) D:\llvm-project\llvm\lib\LTO\LTOBackend.cpp:555:0
#10 0x00007ff62cefc945 `anonymous namespace'::InProcessThinBackend::runThinLTOBackendThread::<unnamed-tag>::operator() D:\llvm-project\llvm\lib\LTO\LTO.cpp:1124:0
#11 0x00007ff62cefc7d3 std::_Func_impl_no_alloc<std::_Binder<std::_Unforced,`lambda at D:\llvm-project\llvm\lib\LTO\LTO.cpp:1157:9',llvm::BitcodeModule &,std::reference_wrapper<llvm::ModuleSummaryIndex>,std::reference_wrapper<const llvm::StringMap<std::unordered_set<unsigned long long,std::hash<unsigned long long>,std::equal_to<unsigned long long>,std::allocator<unsigned long long> >,llvm::MallocAllocator> >,std::reference_wrapper<const llvm::DenseSet<llvm::ValueInfo,llvm::DenseMapInfo<llvm::ValueInfo> > >,std::reference_wrapper<const std::map<unsigned long long,llvm::GlobalValue::LinkageTypes,std::less<unsigned long long>,std::allocator<std::pair<const unsigned long long,llvm::GlobalValue::LinkageTypes> > > >,std::reference_wrapper<const llvm::DenseMap<unsigned long long,llvm::GlobalValueSummary *,llvm::DenseMapInfo<unsigned long long>,llvm::detail::DenseMapPair<unsigned long long,llvm::GlobalValueSummary *> > >,std::reference_wrapper<llvm::MapVector<llvm::StringRef,llvm::BitcodeModule,llvm::DenseMap<llvm::StringRef,unsigned int,llvm::DenseMapInfo<llvm::StringRef>,llvm::detail::DenseMapPair<llvm::StringRef,unsigned int> >,std::vector<std::pair<llvm::StringRef,llvm::BitcodeModule>,std::allocator<std::pair<llvm::StringRef,llvm::BitcodeModule> > > > > >,void>::_Do_call C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.24.28314\include\functional:926:0
#12 0x00007ff62d63d427 std::packaged_task<(void)>::operator()(void) C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.24.28314\include\future:1394:0
#13 0x00007ff62d63d33d std::thread::_Invoke<std::tuple<`lambda at D:\llvm-project\llvm\lib\Support\ThreadPool.cpp:30:26'>,0> C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.24.28314\include\thread:43:0
#14 0x00007ff62f682f04 thread_start<unsigned int (__cdecl*)(void *),1> D:\llvm-project\buildninjaRel\minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:97:0
#15 0x00007ffa95737bd4 (C:\WINDOWS\System32\KERNEL32.DLL+0x17bd4)
#16 0x00007ffa972eced1 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x6ced1)
[4617/4685] Linking CXX executable tools\clang\unittests\Analysis\ClangAnalysisTests.exe
lld-link: warning: ignoring unknown argument '-fuse-ld=lld'
ninja: build stopped: subcommand failed.

I'm at rG6f846c85045ca814b57ee495d9f85a7b0496e947. Building on Windows 10 version 1909, on a VS2019 cmd shell, version 16.4.4. The first stage is built with LLVM 9.0.1.

The cmake I used for stage 1:

set LLVM=c:/Program Files/LLVM
set OPT_AVX=/GS- /D_ITERATOR_DEBUG_LEVEL=0 /arch:AVX
cmake "-DLLVM_LIT_ARGS=-sv -j 36" -GNinja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_LIBXML2=OFF -DCMAKE_C_COMPILER="%LLVM%/bin/clang-cl.EXE" -DCMAKE_CXX_COMPILER="%LLVM%/bin/clang-cl.EXE" -DCMAKE_LINKER="%LLVM%/bin/lld-link.EXE" -DLLVM_ENABLE_PROJECTS="llvm;clang;lld" -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_LLD=ON -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_CXX_FLAGS="%OPT_AVX%" -DCMAKE_C_FLAGS="%OPT_AVX%"

For stage 2:

set OPT_SKYLAKE=/GS- /D_ITERATOR_DEBUG_LEVEL=0 -Xclang -O3 -Xclang -fwhole-program-vtables -fstrict-aliasing -march=skylake-avx512
set LLVM_LOCAL=%ROOT%/buildninjaRel
cmake "-DLLVM_LIT_ARGS=-sv -j 36" -G"Ninja" %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=true -DLLVM_ENABLE_LIBXML2=OFF -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_C_COMPILER="%LLVM_LOCAL%/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="%LLVM_LOCAL%/bin/clang-cl.exe" -DCMAKE_LINKER="%LLVM_LOCAL%/bin/lld-link.exe" -DLLVM_ENABLE_LLD=ON -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_PROJECTS="llvm;clang;lld" -DCMAKE_CXX_FLAGS="%OPT_SKYLAKE%" -DCMAKE_C_FLAGS="%OPT_SKYLAKE%" -DLLVM_ENABLE_LTO=THIN -DCLANG_TABLEGEN="%LLVM_LOCAL%/bin/clang-tblgen.exe" -DLLVM_TABLEGEN="%LLVM_LOCAL%/bin/llvm-tblgen.exe"

Building both stages with ninja check-all.

FYI I have reproduced the failure, and am starting to debug.