Currently UnrollLoops/VectorizeLoop/VectorizeSLP in CGOpts are not passed down to pass builder in ltobackend when new pass manager is used. This is inconsistent with the behavior when new pass manager is used and thinlto is not used. Such inconsistency causes slp vectorization pass not being enabled in ltobackend for O3 + thinlto right now. This patch fixes that.
I think the old PM has some issues as well, although more in the opposite direction. The code in LTOBackend.cpp:runOldPMPasses is simply hardcoding loop and slp vectorization flags to true, but presumably should use whatever is passed through the lto Config as well. Not sure about LoopsInterleaved, ah looks like it is set to the value of the internal option which is enabled by default. That and DisableUnrollLoops should also use whatever is passed through here (similar to how they are set in clang BackendUtil.cpp when configuring the old PM).
Looks like the PTO constructor sets this to true, so we were getting lucky here.
Looks like this only affects the values passed to LoopVectorizePass. Is there a way to test that it is being enabled properly now?
Is this enabled by default at -O3? I looked at clang but couldn't tell where that would be happening. If so, then llvm/test/Other/new-pm-thinlto-defaults.ll should be fixed to check that it is added to the postlink pipeline at -O3 to make sure we don't regress here again.
Thinking through this some more - this patch fixes the distributed ThinLTO backend case, but not the in-process ThinLTO case. For that, both lld and gold plugin will need to set this up. And since they don't have the clang options, they will just need to set to some reasonable default, probably either hardwired on like in the old PM or set based on the CGOptLevel they set from linker options. However, they both default to -O2 unless a special plugin or lld linker option is set, so we'd probably want to enable these passes there at O2+.
I don't see any. There is a test for each that -debug-pass-manager and new-pass-manager are passed through correctly:
Perhaps those can be extended to actually check the passes that get enabled, similar to llvm/test/Other/new-pm-thinlto-defaults.ll?
Enable loop and slp vectorization in O2/O3 (Os for opt as well) in gold plugin, lld, opt and llvm-lto2. Add related tests.
There are still some issues like we cannot override the default setting in O2/O3 using internal flags like -vectorize-loops, -vectorize-slp, and the issues like those setting are now hard coded in old pass manager which Teresa mentioned. I leave them as it is for the moment and let this patch just handle the major issue of slp being disabled in thinlto backend.
For vectorization, given a function with vectorizable loop, I test it by specifying -force-vector-width=2 and -force-vector-interleave=1 and checking whether the IR will contain llvm.loop.isvectorized metadata at the end of the pipeline.
Similarly for interleaving, I test it by specifying -force-vector-width=1 and -force-vector-interleave=2 and checking whether the IR will contain llvm.loop.isvectorized metadata at the end of the pipeline.
Yes, -vectorize-slp will be added as cc1 option when O2/O3 is specified for clang.
I fixed llvm/test/Other/new-pm-thinlto-defaults.ll, but llvm/test/Other/new-pm-thinlto-defaults.ll depends on the setting in opt instead of clang, so I also enabled slp and loop vectorization in O2, O3 and Os in opt.
Thanks! Looks pretty good, I just have a few comments and questions.
Can you clarify the difference between -vectorize-slp and -mllvm -vectorize-slp=false? Is this related to you comment about not being able to use the internal options to disable it? Worth a comment in that case. Ditto for the vectorize-loops tests below.
I think the above testing was cloned from another existing lld test - can it be removed from this one since not related to vectorization?
add doxygen comment
Does clang run this at -Os?
Similar comment to new lld test - I think the above lines are from another test and can be removed from here?
-vectorize-slp is a cc1 option and will be added automatically when O2/O3/Os/Oz is available. Once -vectorize-slp is enabled, "-mllvm -vectorize-slp=false" won't disable it currently. I added "-mllvm -vectorize-slp=false" here in the test to ensure the slp vectorization is executed because the -vectorize-slp cc1 flag is passed down, not because -mllvm -vectorize-slp is enabled.
Yes it does. But surprisingly I find clang add "-vectorize-slp" at Oz as well but doesn't not add "-vectorize-loops" at Oz. This is another inconsistency with opt old pass manager, which enable neither slp nor vectorize-loops at Oz. I am not sure how it should behave.
Why add "-mllvm -vectorize-slp=false" to the invocation lines though? If it is to illustrate that the internal option cannot override the cc1 option, then add comments to that effect.
Hmm, I think probably best to be consistent with the old PM on this for now, and add a comment where you enable these differently. This should presumably be reevaluated, but probably best if the two are in sync for now.
I add it to the invocation line because if the default value of "-mllvm -vectorize-slp" is true, even without this patch, the test will still pass. That is why I want to add "-mllvm -vectorize-slp=false" explicitly to ensure the patch is effective.
Ok, thanks. I will add a comment at the place where it sets this.
A few minor suggestions for comment tweaks, but lgtm otherwise.
maybe add "by default" to the end of the last sentence for clarity? Ditto elsewhere.
Do you mean "consistent with old pass manager invoked via opt"? It currently reads a little confusing, because the old PM is also invoked via clang.