This is an archive of the discontinued LLVM Phabricator instance.

[NPM][LTO] Update buildLTODefaultPipeline to be more in-line with the old pass manager
ClosedPublic

Authored by dmgreen on Feb 16 2021, 6:22 AM.

Details

Summary

The NPM LTO pipeline has a lot of fixme's and missing passes, causing a lot of regressions after the switch in c70737b. Notably unrolling and vectorization were both disabled, but many other passes are missing compared to the old pass manager. This attempt to enable the most obvious missing passes like the unroller, vectorization and other loop passes, fixing the existing FIXME comments.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 16 2021, 6:22 AM
dmgreen requested review of this revision.Feb 16 2021, 6:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 6:22 AM
dmgreen updated this revision to Diff 324009.Feb 16 2021, 7:32 AM

Added some passes to a LPM, to get the same indvar->unroll->indvar behavior as before.

fhahn added a comment.Feb 16 2021, 8:45 AM

Thanks for putting up the patch! I am a bit surprised that all those optimizations were not run during LTO and that this did not pop up before the switch.

llvm/lib/Passes/PassBuilder.cpp
1740–1741

It looks like there are a few other passes missing here, like LoopDeletion and MergedLoadStoreMotionPass, and conditionally LoopInterchange/ConstraintElimination. Should they be added as well?

1762

Should we run WarnMissedTransformationsPass here, like in the legacy PM?

fhahn added a comment.Feb 16 2021, 8:53 AM

(now that c70737ba1dea landed, I should be able to provide some performance numbers on Darwin platforms soonish)

dmgreen added a comment.EditedFeb 16 2021, 9:39 AM

Thanks for putting up the patch! I am a bit surprised that all those optimizations were not run during LTO and that this did not pop up before the switch.

We didn't see any change in SPEC, surprisingly. I think it's much more important for certain code than others. We have some DSP routines where with LTO the loop range can be constant-propagated, allowing the whole loop to potentially be unrolled and greatly simplified. The largest increase I saw from this patch was a 1350% speed increase, with others in the 100-500% range. This didn't get us exactly back to the old pass manager though, which is probably expected and what remained was some much smaller positives, some negatives.

llvm/lib/Passes/PassBuilder.cpp
1740–1741

I was trying to keep this simple, to fix the regressions sooner rather than later. I'll see what I can do to add a few others.

dmgreen updated this revision to Diff 324046.Feb 16 2021, 10:16 AM
dmgreen marked 2 inline comments as done.

Added a few extra passes.

llvm/lib/Passes/PassBuilder.cpp
1740–1741

LoopInterchange I've not added because it's not anywhere in the NPM yet. It can be added when it's added to the main optimization passes. The others should be here now.

It would be great to have some compile-time impact data before landing (https://llvm-compile-time-tracker.com/, clang self-host, etc.). This also depends on the potential performance gain on average. Just want to make sure we make the appropriate tradeoff.

It would be great to have some compile-time impact data before landing (https://llvm-compile-time-tracker.com/, clang self-host, etc.). This also depends on the potential performance gain on average. Just want to make sure we make the appropriate tradeoff.

"1350%" :)

This is only trying to get us back to where we were before, before all the regressions from the new pass manager. Whilst I can agree that compile time is important, this was a pretty significant set of regressions for us. Even if it is a compile time increase, the results I'm seeing would easily make it worth-while! The only real alternative is that we revert c70737b again until this can be fixed (which I was hoping not to have to do).

fhahn added a comment.Feb 16 2021, 1:16 PM

It would be great to have some compile-time impact data before landing (https://llvm-compile-time-tracker.com/, clang self-host, etc.). This also depends on the potential performance gain on average. Just want to make sure we make the appropriate tradeoff.

"1350%" :)

This is only trying to get us back to where we were before, before all the regressions from the new pass manager. Whilst I can agree that compile time is important, this was a pretty significant set of regressions for us.

+1. From the FIXMEs it appears as if the omission of LoopVectorize. & co was not intentional, but because of initial bugs/limitations, so enabling them even if it impacts compile-time seems appropriate to me. This will help making the transition as smooth as possible and will avoid us having to track down/fix obvious regressions caused by big differences in the pipeline.

I think if we want to drastically adjust the pipeline, that should be done well after we ironed out the issues caused alone by the NewPM switch.

llvm/lib/Passes/PassBuilder.cpp
1740–1741

Oh right, we don't have to add them all in one go.

fhahn added a comment.Feb 16 2021, 1:19 PM

It would be great to have some compile-time impact data before landing (https://llvm-compile-time-tracker.com/, clang self-host, etc.). This also depends on the potential performance gain on average. Just want to make sure we make the appropriate tradeoff.

(I kicked off a run on https://llvm-compile-time-tracker.com/ with the patch)

ychen added a comment.Feb 16 2021, 1:26 PM

:-) I should have mentioned that I do think we should do this even with compile-time regression. It is just useful to know what the regression/perf gain ratio is so we have the proper expectations.

thanks for doing this. I think nobody was using monolithic LTO with the new PM before so nobody had looked into this

(I kicked off a run on https://llvm-compile-time-tracker.com/ with the patch)

Thanks I don't think I have access.

Is this https://llvm-compile-time-tracker.com/compare.php?from=ba2aa5f49ebbe28ad2dbf0c5bea451f0ebf436c6&to=351ad9bca5822d58b1aef519eea069135406613a&stat=instructions? It's an increase for NewPM-ReleaseLTO-g. But with the number of extra passes run, I would say that is in line with expectation.

nikic added a subscriber: nikic.Feb 16 2021, 2:30 PM

(I kicked off a run on https://llvm-compile-time-tracker.com/ with the patch)

Thanks I don't think I have access.

Is this https://llvm-compile-time-tracker.com/compare.php?from=ba2aa5f49ebbe28ad2dbf0c5bea451f0ebf436c6&to=351ad9bca5822d58b1aef519eea069135406613a&stat=instructions? It's an increase for NewPM-ReleaseLTO-g. But with the number of extra passes run, I would say that is in line with expectation.

Yes, that's the one. The impact looks fine to me for what this change does. Worth noting that even with this change NewPM-ReleaseLTO-g is still faster than LegacyPM-ReleaseLTO-g on average -- it just caught up a bit :)

fhahn accepted this revision.Feb 17 2021, 8:37 AM

LGTM, thanks! This brings the new PM LTO pipeline more in line with the legacy PM. Compile-time increases a bit, but as discussed in earlier comments, it's in line with the expectations.

This revision is now accepted and ready to land.Feb 17 2021, 8:37 AM

Thanks folks.

Hi, it looks like this change broke our (Fuchsia's) toolchain builder. We are seeing linker errors when we do stage2 builds:

FAILED: lib/libLTO.so.13git 
: && /b/s/w/ir/x/w/staging/llvm_build/./bin/clang++ --sysroot=/b/s/w/ir/x/w/cipd/linux-amd64 -fPIC -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto -ffile-prefix-map=/b/s/w/ir/x/w/staging/llvm_build/tools/clang/stage2-bins=../staging/llvm_build/tools/clang/stage2-bins -ffile-prefix-map=/b/s/w/ir/x/w/llvm-project/= -no-canonical-prefixes -O3 -DNDEBUG  -static-libstdc++ -Wl,-z,defs -Wl,-z,nodelete -fuse-ld=lld -Wl,--color-diagnostics -flto   -Wl,-O3 -Wl,--gc-sections  -Wl,--version-script,"/b/s/w/ir/x/w/staging/llvm_build/tools/clang/stage2-bins/tools/lto/LTO.exports" -shared -Wl,-soname,libLTO.so.13git -o lib/libLTO.so.13git tools/lto/CMakeFiles/LTO.dir/LTODisassembler.cpp.o tools/lto/CMakeFiles/LTO.dir/lto.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMX86AsmParser.a  lib/libLLVMARMAsmParser.a  lib/libLLVMAArch64AsmParser.a  lib/libLLVMRISCVAsmParser.a  lib/libLLVMX86CodeGen.a  lib/libLLVMARMCodeGen.a  lib/libLLVMAArch64CodeGen.a  lib/libLLVMRISCVCodeGen.a  lib/libLLVMX86Desc.a  lib/libLLVMARMDesc.a  lib/libLLVMAArch64Desc.a  lib/libLLVMRISCVDesc.a  lib/libLLVMX86Disassembler.a  lib/libLLVMARMDisassembler.a  lib/libLLVMAArch64Disassembler.a  lib/libLLVMRISCVDisassembler.a  lib/libLLVMX86Info.a  lib/libLLVMARMInfo.a  lib/libLLVMAArch64Info.a  lib/libLLVMRISCVInfo.a  lib/libLLVMBitReader.a  lib/libLLVMCore.a  lib/libLLVMCodeGen.a  lib/libLLVMLTO.a  lib/libLLVMMC.a  lib/libLLVMMCDisassembler.a  lib/libLLVMSupport.a  lib/libLLVMTarget.a  lib/libLLVMCFGuard.a  lib/libLLVMAsmPrinter.a  lib/libLLVMDebugInfoDWARF.a  lib/libLLVMGlobalISel.a  lib/libLLVMSelectionDAG.a  lib/libLLVMARMDesc.a  lib/libLLVMARMInfo.a  lib/libLLVMARMUtils.a  lib/libLLVMAArch64Desc.a  lib/libLLVMAArch64Info.a  lib/libLLVMAArch64Utils.a  lib/libLLVMRISCVDesc.a  lib/libLLVMRISCVInfo.a  lib/libLLVMMCDisassembler.a  lib/libLLVMCodeGen.a  lib/libLLVMExtensions.a  lib/libLLVMPasses.a  lib/libLLVMTarget.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMCoroutines.a  lib/libLLVMipo.a  lib/libLLVMScalarOpts.a  lib/libLLVMBitWriter.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMLinker.a  lib/libLLVMFrontendOpenMP.a  lib/libLLVMIRReader.a  lib/libLLVMAsmParser.a  lib/libLLVMVectorize.a  lib/libLLVMInstrumentation.a  lib/libLLVMTransformUtils.a  lib/libLLVMAnalysis.a  lib/libLLVMProfileData.a  lib/libLLVMObject.a  lib/libLLVMBitReader.a  lib/libLLVMCore.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMMCParser.a  lib/libLLVMMC.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMDebugInfoMSF.a  lib/libLLVMTextAPI.a  lib/libLLVMBinaryFormat.a  lib/libLLVMSupport.a  -lrt  -ldl  -lpthread  -lm  /b/s/w/ir/x/w/staging/zlib_install/lib/libz.a  lib/libLLVMDemangle.a && :
ld.lld: /b/s/w/ir/x/w/llvm-project/llvm/lib/Analysis/MemorySSAUpdater.cpp:1275: void llvm::MemorySSAUpdater::wireOldPredecessorsToNewImmediatePredecessor(llvm::BasicBlock *, llvm::BasicBlock *, ArrayRef<llvm::BasicBlock *>, bool): Assertion `!MSSA->getWritableBlockAccesses(New) && "Access list should be null for a new block."' failed.
clang++: error: unable to execute command: Aborted
clang++: error: linker command failed due to signal (use -v to see invocation)

We tried to revert this change locally and this error goes away. Could you revert this change please?

Hello. Yeah I've been looking at a similar thing. Let me try and fix it, after verifying it works. The LPM is not preserving MemorySSA.

Thanks for the report.

I've committed c141c6551be64f220b71786d24e98f6de906e6de in an attempt to fix the similar problem I was seeing here. Please let me know if it helps or doesn't in the Fuchsia build.

I've committed c141c6551be64f220b71786d24e98f6de906e6de in an attempt to fix the similar problem I was seeing here. Please let me know if it helps or doesn't in the Fuchsia build.

Thanks. This looks like it fixed our builders.