The NPM LTO pipeline has a lot of fixme's and missing passes, causing a lot of regressions after the switch in c70737b. Notably unrolling and vectorization were both disabled, but many other passes are missing compared to the old pass manager. This attempt to enable the most obvious missing passes like the unroller, vectorization and other loop passes, fixing the existing FIXME comments.
Details
Diff Detail
Event Timeline
Added some passes to a LPM, to get the same indvar->unroll->indvar behavior as before.
Thanks for putting up the patch! I am a bit surprised that all those optimizations were not run during LTO and that this did not pop up before the switch.
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1740–1741 | It looks like there are a few other passes missing here, like LoopDeletion and MergedLoadStoreMotionPass, and conditionally LoopInterchange/ConstraintElimination. Should they be added as well? | |
1762 | Should we run WarnMissedTransformationsPass here, like in the legacy PM? |
(now that c70737ba1dea landed, I should be able to provide some performance numbers on Darwin platforms soonish)
We didn't see any change in SPEC, surprisingly. I think it's much more important for certain code than others. We have some DSP routines where with LTO the loop range can be constant-propagated, allowing the whole loop to potentially be unrolled and greatly simplified. The largest increase I saw from this patch was a 1350% speed increase, with others in the 100-500% range. This didn't get us exactly back to the old pass manager though, which is probably expected and what remained was some much smaller positives, some negatives.
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1740–1741 | I was trying to keep this simple, to fix the regressions sooner rather than later. I'll see what I can do to add a few others. |
Added a few extra passes.
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1740–1741 | LoopInterchange I've not added because it's not anywhere in the NPM yet. It can be added when it's added to the main optimization passes. The others should be here now. |
It would be great to have some compile-time impact data before landing (https://llvm-compile-time-tracker.com/, clang self-host, etc.). This also depends on the potential performance gain on average. Just want to make sure we make the appropriate tradeoff.
"1350%" :)
This is only trying to get us back to where we were before, before all the regressions from the new pass manager. Whilst I can agree that compile time is important, this was a pretty significant set of regressions for us. Even if it is a compile time increase, the results I'm seeing would easily make it worth-while! The only real alternative is that we revert c70737b again until this can be fixed (which I was hoping not to have to do).
+1. From the FIXMEs it appears as if the omission of LoopVectorize. & co was not intentional, but because of initial bugs/limitations, so enabling them even if it impacts compile-time seems appropriate to me. This will help making the transition as smooth as possible and will avoid us having to track down/fix obvious regressions caused by big differences in the pipeline.
I think if we want to drastically adjust the pipeline, that should be done well after we ironed out the issues caused alone by the NewPM switch.
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1740–1741 | Oh right, we don't have to add them all in one go. |
:-) I should have mentioned that I do think we should do this even with compile-time regression. It is just useful to know what the regression/perf gain ratio is so we have the proper expectations.
thanks for doing this. I think nobody was using monolithic LTO with the new PM before so nobody had looked into this
(I kicked off a run on https://llvm-compile-time-tracker.com/ with the patch)
Thanks I don't think I have access.
Is this https://llvm-compile-time-tracker.com/compare.php?from=ba2aa5f49ebbe28ad2dbf0c5bea451f0ebf436c6&to=351ad9bca5822d58b1aef519eea069135406613a&stat=instructions? It's an increase for NewPM-ReleaseLTO-g. But with the number of extra passes run, I would say that is in line with expectation.
Yes, that's the one. The impact looks fine to me for what this change does. Worth noting that even with this change NewPM-ReleaseLTO-g is still faster than LegacyPM-ReleaseLTO-g on average -- it just caught up a bit :)
LGTM, thanks! This brings the new PM LTO pipeline more in line with the legacy PM. Compile-time increases a bit, but as discussed in earlier comments, it's in line with the expectations.
Hi, it looks like this change broke our (Fuchsia's) toolchain builder. We are seeing linker errors when we do stage2 builds:
FAILED: lib/libLTO.so.13git : && /b/s/w/ir/x/w/staging/llvm_build/./bin/clang++ --sysroot=/b/s/w/ir/x/w/cipd/linux-amd64 -fPIC -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto -ffile-prefix-map=/b/s/w/ir/x/w/staging/llvm_build/tools/clang/stage2-bins=../staging/llvm_build/tools/clang/stage2-bins -ffile-prefix-map=/b/s/w/ir/x/w/llvm-project/= -no-canonical-prefixes -O3 -DNDEBUG -static-libstdc++ -Wl,-z,defs -Wl,-z,nodelete -fuse-ld=lld -Wl,--color-diagnostics -flto -Wl,-O3 -Wl,--gc-sections -Wl,--version-script,"/b/s/w/ir/x/w/staging/llvm_build/tools/clang/stage2-bins/tools/lto/LTO.exports" -shared -Wl,-soname,libLTO.so.13git -o lib/libLTO.so.13git tools/lto/CMakeFiles/LTO.dir/LTODisassembler.cpp.o tools/lto/CMakeFiles/LTO.dir/lto.cpp.o -Wl,-rpath,"\$ORIGIN/../lib" lib/libLLVMX86AsmParser.a lib/libLLVMARMAsmParser.a lib/libLLVMAArch64AsmParser.a lib/libLLVMRISCVAsmParser.a lib/libLLVMX86CodeGen.a lib/libLLVMARMCodeGen.a lib/libLLVMAArch64CodeGen.a lib/libLLVMRISCVCodeGen.a lib/libLLVMX86Desc.a lib/libLLVMARMDesc.a lib/libLLVMAArch64Desc.a lib/libLLVMRISCVDesc.a lib/libLLVMX86Disassembler.a lib/libLLVMARMDisassembler.a lib/libLLVMAArch64Disassembler.a lib/libLLVMRISCVDisassembler.a lib/libLLVMX86Info.a lib/libLLVMARMInfo.a lib/libLLVMAArch64Info.a lib/libLLVMRISCVInfo.a lib/libLLVMBitReader.a lib/libLLVMCore.a lib/libLLVMCodeGen.a lib/libLLVMLTO.a lib/libLLVMMC.a lib/libLLVMMCDisassembler.a lib/libLLVMSupport.a lib/libLLVMTarget.a lib/libLLVMCFGuard.a lib/libLLVMAsmPrinter.a lib/libLLVMDebugInfoDWARF.a lib/libLLVMGlobalISel.a lib/libLLVMSelectionDAG.a lib/libLLVMARMDesc.a lib/libLLVMARMInfo.a lib/libLLVMARMUtils.a lib/libLLVMAArch64Desc.a lib/libLLVMAArch64Info.a lib/libLLVMAArch64Utils.a lib/libLLVMRISCVDesc.a lib/libLLVMRISCVInfo.a lib/libLLVMMCDisassembler.a lib/libLLVMCodeGen.a lib/libLLVMExtensions.a lib/libLLVMPasses.a lib/libLLVMTarget.a lib/libLLVMObjCARCOpts.a lib/libLLVMCoroutines.a lib/libLLVMipo.a lib/libLLVMScalarOpts.a lib/libLLVMBitWriter.a lib/libLLVMAggressiveInstCombine.a lib/libLLVMInstCombine.a lib/libLLVMLinker.a lib/libLLVMFrontendOpenMP.a lib/libLLVMIRReader.a lib/libLLVMAsmParser.a lib/libLLVMVectorize.a lib/libLLVMInstrumentation.a lib/libLLVMTransformUtils.a lib/libLLVMAnalysis.a lib/libLLVMProfileData.a lib/libLLVMObject.a lib/libLLVMBitReader.a lib/libLLVMCore.a lib/libLLVMRemarks.a lib/libLLVMBitstreamReader.a lib/libLLVMMCParser.a lib/libLLVMMC.a lib/libLLVMDebugInfoCodeView.a lib/libLLVMDebugInfoMSF.a lib/libLLVMTextAPI.a lib/libLLVMBinaryFormat.a lib/libLLVMSupport.a -lrt -ldl -lpthread -lm /b/s/w/ir/x/w/staging/zlib_install/lib/libz.a lib/libLLVMDemangle.a && : ld.lld: /b/s/w/ir/x/w/llvm-project/llvm/lib/Analysis/MemorySSAUpdater.cpp:1275: void llvm::MemorySSAUpdater::wireOldPredecessorsToNewImmediatePredecessor(llvm::BasicBlock *, llvm::BasicBlock *, ArrayRef<llvm::BasicBlock *>, bool): Assertion `!MSSA->getWritableBlockAccesses(New) && "Access list should be null for a new block."' failed. clang++: error: unable to execute command: Aborted clang++: error: linker command failed due to signal (use -v to see invocation)
We tried to revert this change locally and this error goes away. Could you revert this change please?
Hello. Yeah I've been looking at a similar thing. Let me try and fix it, after verifying it works. The LPM is not preserving MemorySSA.
Thanks for the report.
I've committed c141c6551be64f220b71786d24e98f6de906e6de in an attempt to fix the similar problem I was seeing here. Please let me know if it helps or doesn't in the Fuchsia build.
It looks like there are a few other passes missing here, like LoopDeletion and MergedLoadStoreMotionPass, and conditionally LoopInterchange/ConstraintElimination. Should they be added as well?