This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Passes/
-
Passes/
-
PassBuilderPipelines.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
Transforms/PhaseOrdering/
-
PhaseOrdering/
-
deletion-of-loops-that-became-side-effect-free.ll

Differential D112851

[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes
ClosedPublic

Authored by lebedev.ri on Oct 29 2021, 3:40 PM.

Download Raw Diff

Details

Reviewers

aeubanks
asbirlea
reames
mkazantsev
fhahn
jdoerfert
nikic

Commits

rG9c2469c1ddb3: [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass…

Summary

Test thanks to Michael Kuklinski from #llvm: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/

We manage to deduce that the answer does not require looping,
but we do that after the last LoopDeletion pass run,
so we end up being stuck with a dead loop.

Now, as with all things SCEV, this has a very expected ~+0.12% compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions
(for comparison, doing that in function simplification pipeline
would have been ~+0.5 compile time performance regression, D112840)

Looking at the transformation stats over vanilla test-suite, i think it's rather expected:

| statistic name                                   |  baseline |  proposed |     Δ |      % |    |%| |
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
| scalar-evolution.NumBruteForceTripCountsComputed |       789 |       888 |    99 | 12.55% | 12.55% |
| scalar-evolution.NumTripCountsNotComputed        |    105592 |    117900 | 12308 | 11.66% | 11.66% |
| loop-delete.NumBackedgesBroken                   |       542 |       559 |    17 |  3.14% |  3.14% |
| regalloc.numExtends                              |        81 |        79 |    -2 | -2.47% |  2.47% |
| indvars.NumFoldedUser                            |       408 |       400 |    -8 | -1.96% |  1.96% |
| indvars.NumElimCmp                               |      3831 |      3758 |   -73 | -1.91% |  1.91% |
| scalar-evolution.NumTripCountsComputed           |    299759 |    304278 |  4519 |  1.51% |  1.51% |
| loop-delete.NumDeleted                           |      8055 |      8128 |    73 |  0.91% |  0.91% |
| machine-cse.NumCommutes                          |       111 |       110 |    -1 | -0.90% |  0.90% |
| globaldce.NumFunctions                           |      1187 |      1192 |     5 |  0.42% |  0.42% |
| codegenprepare.NumSelectsExpanded                |       277 |       278 |     1 |  0.36% |  0.36% |
| loop-unroll.NumRuntimeUnrolled                   |     13841 |     13791 |   -50 | -0.36% |  0.36% |
| machinelicm.NumPostRAHoisted                     |      1168 |      1172 |     4 |  0.34% |  0.34% |
| phi-node-elimination.NumCriticalEdgesSplit       |     83054 |     82879 |  -175 | -0.21% |  0.21% |
| machine-cse.NumPREs                              |      3085 |      3079 |    -6 | -0.19% |  0.19% |
| branch-folder.NumBranchOpts                      |    108122 |    107942 |  -180 | -0.17% |  0.17% |
| loop-unroll.NumUnrolled                          |     40136 |     40067 |   -69 | -0.17% |  0.17% |
| branch-folder.NumDeadBlocks                      |    130818 |    130607 |  -211 | -0.16% |  0.16% |
| codegenprepare.NumBlocksElim                     |     92856 |     92714 |  -142 | -0.15% |  0.15% |
| instsimplify.NumSimplified                       |    103263 |    103129 |  -134 | -0.13% |  0.13% |
| instcombine.NumConstProp                         |     26070 |     26102 |    32 |  0.12% |  0.12% |
| instsimplify.NumExpand                           |      1716 |      1718 |     2 |  0.12% |  0.12% |
| loop-unroll.NumCompletelyUnrolled                |      9236 |      9225 |   -11 | -0.12% |  0.12% |
| branch-folder.NumHoist                           |      2773 |      2770 |    -3 | -0.11% |  0.11% |
| regalloc.NumReloadsRemoved                       |     10822 |     10834 |    12 |  0.11% |  0.11% |
| regalloc.NumSnippets                             |     11394 |     11406 |    12 |  0.11% |  0.11% |
| machine-cse.NumCrossBBCSEs                       |      1052 |      1053 |     1 |  0.10% |  0.10% |
| machinelicm.NumCSEed                             |     99887 |     99784 |  -103 | -0.10% |  0.10% |
| branch-folder.NumTailMerge                       |     72501 |     72435 |   -66 | -0.09% |  0.09% |
| codegenprepare.NumExtUses                        |     22007 |     21987 |   -20 | -0.09% |  0.09% |
| local.NumRemoved                                 |     68232 |     68294 |    62 |  0.09% |  0.09% |
| loop-vectorize.LoopsAnalyzed                     |     75483 |     75413 |   -70 | -0.09% |  0.09% |

Note that i'm only changing current PM, and not touching obsolete PM.

This is an alternative to the function simplification pipeline variant of the same change, D112840.
It has both less compile time impact (since the additional number of SCEV trip count calculations
is way lass less than with the D112840), and it is much more powerful/impactful (almost 2x more loops deleted).
I have checked, and doing this after loop rotation is favorable (more loops deleted).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 29 2021, 3:40 PM

Herald added subscribers: ormris, wenlei, steven_wu and 2 others. · View Herald TranscriptOct 29 2021, 3:40 PM

lebedev.ri requested review of this revision.Oct 29 2021, 3:40 PM

lebedev.ri mentioned this in D112840: [PassManager] `buildFunctionSimplificationPipeline()`: schedule another `LoopDeletion` pass run before last `LICM`.

lebedev.ri edited the summary of this revision. (Show Details)Oct 29 2021, 3:45 PM

Harbormaster completed remote builds in B131529: Diff 383516.Oct 29 2021, 4:12 PM

Deletion of loops before they are vectorized absolutely makes sense to me. Please give it couple more days just in case if someone has concerns regarding the regressions, but making vectorizer run on empty doesn't sound. Please give it 2-3 days to hear others' concerns.

This revision is now accepted and ready to land.Nov 1 2021, 12:19 AM

In D112851#3099588, @mkazantsev wrote:

Deletion of loops before they are vectorized absolutely makes sense to me. Please give it couple more days just in case if someone has concerns regarding the regressions, but making vectorizer run on empty doesn't sound. Please give it 2-3 days to hear others' concerns.

Indeed, even if/when what @aeubanks suggested in https://reviews.llvm.org/D112840#3097891 is implemented,
somehow i don't think it will alleviate the need for *this*-late LoopDeletion.

Thank you for the review, i'll wait a bit on @aeubanks / @asbirlea / @dmgreen,
though given just how little of a compile-time regression this has become,
i can't imagine this will be blocked.

Yeah, Seems OK to me.

(Copied from the other change)
I looked into @is_not_empty_variant1(). It seems like GVN + an instcombine cleanup is what's allowing the loop to be deleted. Currently GVN runs after the loop passes which is the issue.
I tried replacing the EarlyCSEPass at the beginning of the function simplification pipeline with a NewGVNPass and that fixes @is_not_empty_variant1. I think something along those lines is more principled than adding an LoopDeletionPass somewhere.
(Of course, if we want to turn on NewGVN specifically we need to iron out remaining issues)

In D112851#3099665, @lebedev.ri wrote:

In D112851#3099588, @mkazantsev wrote:

Deletion of loops before they are vectorized absolutely makes sense to me. Please give it couple more days just in case if someone has concerns regarding the regressions, but making vectorizer run on empty doesn't sound. Please give it 2-3 days to hear others' concerns.

Indeed, even if/when what @aeubanks suggested in https://reviews.llvm.org/D112840#3097891 is implemented,
somehow i don't think it will alleviate the need for *this*-late LoopDeletion.

Can you explain why *this* late LoopDeletion would still be necessary even with the GVN phase ordering changes?
It makes sense to me that specifically GVN can optimize enough to make some loops deletable. Ideally we'd already delete all deletable loops in the function simplification pipeline rather than late in the pipeline. Anything to do with deleting unused code should trigger in the simplification pipeline, not the late optimization pipeline, since deleting unnecessary instructions is a canonicalization.
And I'm sure there are many other loop passes that could benefit from having GVN run first.

Take all of this with a grain of salt because running an extra GVN pass is very expensive. We run EarlyCSE at the beginning of the function simplification pipeline as a mini-GVN, but replacing that with NewGVN, which is supposed to be significantly faster than GVN, still results in major compile time regressions. https://llvm-compile-time-tracker.com/compare.php?from=1c05c52de2177a328b7d2d07b697af67eb9f8122&to=755e184ed0216ef39d68621a9b19a3dd34d677f2&stat=instructions

So overall I think this is a hack to fix a fairly specific use case, but a proper fix is nowhere in the near future. If you really think this snippet of code is important to optimize then I'm not opposed to this change until we have a more proper fix in the future.
We can always tack on passes to the end of the pipeline to handle missed optimizations due to phase ordering, it doesn't mean it's principled. (of course if you can explain why an extra loop deletion pass on top of the one in the simplification pipeline is principled I'm all ears)

Thank you for the review, i'll wait a bit on @aeubanks / @asbirlea / @dmgreen,
though given just how little of a compile-time regression this has become,
i can't imagine this will be blocked.

In D112851#3101031, @aeubanks wrote:

(Copied from the other change)
I looked into @is_not_empty_variant1(). It seems like GVN + an instcombine cleanup is what's allowing the loop to be deleted. Currently GVN runs after the loop passes which is the issue.
I tried replacing the EarlyCSEPass at the beginning of the function simplification pipeline with a NewGVNPass and that fixes @is_not_empty_variant1. I think something along those lines is more principled than adding an LoopDeletionPass somewhere.
(Of course, if we want to turn on NewGVN specifically we need to iron out remaining issues)

In D112851#3101031, @aeubanks wrote:

In D112851#3099665, @lebedev.ri wrote:

In D112851#3099588, @mkazantsev wrote:

Deletion of loops before they are vectorized absolutely makes sense to me. Please give it couple more days just in case if someone has concerns regarding the regressions, but making vectorizer run on empty doesn't sound. Please give it 2-3 days to hear others' concerns.

Indeed, even if/when what @aeubanks suggested in https://reviews.llvm.org/D112840#3097891 is implemented,
somehow i don't think it will alleviate the need for *this*-late LoopDeletion.

Can you explain why *this* late LoopDeletion would still be necessary even with the GVN phase ordering changes?

Well, i'm not actually saying that, i'm only speculating as much.
Let's suppose GVN phase ordering change happens, and D112840 would become unnecessary,
But, i posted this change (replacing D112840) because while D112840 (on vanilla test-suite) caused

| statistic name                                   |  baseline |  proposed |      Δ |      % |    |%| |
|--------------------------------------------------|----------:|----------:|-------:|-------:|-------:|
| loop-delete.NumBackedgesBroken                   |       542 |       557 |     15 |  2.77% |  2.77% |
| loop-delete.NumDeleted                           |      8055 |      8096 |     41 |  0.51% |  0.51% |

while this variant causes

| statistic name                                   |  baseline |  proposed |     Δ |      % |    |%| |
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
| loop-delete.NumBackedgesBroken                   |       542 |       559 |    17 |  3.14% |  3.14% |
| loop-delete.NumDeleted                           |      8055 |      8128 |    73 |  0.91% |  0.91% |

Aka, as compared to baseline, this deletes ~80% more loops than D112840 would.
Notably, this new LoopDelete is placed after LoopRotatePass, because i have checked, and that
also causes an increase in num loops deleted, as compared with placing it before LoopRotatePass.
So right now, i would not expect that GVN phase ordering improvement will make this late LoopDeletion unnecessary,
but if i'm proven wrong and it does, then great and this could be reverted.

It makes sense to me that specifically GVN can optimize enough to make some loops deletable. Ideally we'd already delete all deletable loops in the function simplification pipeline rather than late in the pipeline. Anything to do with deleting unused code should trigger in the simplification pipeline, not the late optimization pipeline, since deleting unnecessary instructions is a canonicalization.
And I'm sure there are many other loop passes that could benefit from having GVN run first.

Sure, i'm not arguing that we *should't* succeed with this in simplification pipeline,
i'm only saying that at least currently, even more loops become dead *after* the simplification pipeline, in optimization pipeline.

Take all of this with a grain of salt because running an extra GVN pass is very expensive. We run EarlyCSE at the beginning of the function simplification pipeline as a mini-GVN, but replacing that with NewGVN, which is supposed to be significantly faster than GVN, still results in major compile time regressions. https://llvm-compile-time-tracker.com/compare.php?from=1c05c52de2177a328b7d2d07b697af67eb9f8122&to=755e184ed0216ef39d68621a9b19a3dd34d677f2&stat=instructions

Right.

So overall I think this is a hack to fix a fairly specific use case, but a proper fix is nowhere in the near future. If you really think this snippet of code is important to optimize then I'm not opposed to this change until we have a more proper fix in the future.
We can always tack on passes to the end of the pipeline to handle missed optimizations due to phase ordering, it doesn't mean it's principled. (of course if you can explain why an extra loop deletion pass on top of the one in the simplification pipeline is principled I'm all ears)

Yay, i love ugly hacks.

Thank you for the review, i'll wait a bit on @aeubanks / @asbirlea / @dmgreen,
though given just how little of a compile-time regression this has become,
i can't imagine this will be blocked.

I think unless there will be concerns raised by then, i'll land this friday.
After all, this isn't setting things in stone, once the GVN issue is resolved this can be revisited.

In D112851#3105344, @lebedev.ri wrote:

I think unless there will be concerns raised by then, i'll land this friday.
After all, this isn't setting things in stone, once the GVN issue is resolved this can be revisited.

Seems fine to me, but could you add a comment, perhaps a link to this discussion?

Adding comment.

This revision was landed with ongoing or failed builds.Nov 3 2021, 9:25 AM

Closed by commit rG9c2469c1ddb3: [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass… (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG9c2469c1ddb3: [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass….

Harbormaster completed remote builds in B132253: Diff 384486.Nov 3 2021, 10:19 AM

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilderPipelines.cpp

9 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

new-pm-thinlto-postlink-pgo-defaults.ll

1 line

new-pm-thinlto-postlink-samplepgo-defaults.ll

1 line

Transforms/

PhaseOrdering/

deletion-of-loops-that-became-side-effect-free.ll

49 lines

Diff 384487

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 1,087 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,

// Optimize the loop execution. These passes operate on entire loop nests		// Optimize the loop execution. These passes operate on entire loop nests
// rather than on each loop in an inside-out manner, and so they are actually		// rather than on each loop in an inside-out manner, and so they are actually
// function passes.		// function passes.

for (auto &C : VectorizerStartEPCallbacks)		for (auto &C : VectorizerStartEPCallbacks)
C(OptimizePM, Level);		C(OptimizePM, Level);

		LoopPassManager LPM;
// First rotate loops that may have been un-rotated by prior passes.		// First rotate loops that may have been un-rotated by prior passes.
// Disable header duplication at -Oz.		// Disable header duplication at -Oz.
		LPM.addPass(LoopRotatePass(Level != OptimizationLevel::Oz, LTOPreLink));
		// Some loops may have become dead by now. Try to delete them.
		// FIXME: see disscussion in https://reviews.llvm.org/D112851
		// this may need to be revisited once GVN is more powerful.
		LPM.addPass(LoopDeletionPass());
OptimizePM.addPass(createFunctionToLoopPassAdaptor(		OptimizePM.addPass(createFunctionToLoopPassAdaptor(
LoopRotatePass(Level != OptimizationLevel::Oz, LTOPreLink),		std::move(LPM), /UseMemorySSA=/false, /UseBlockFrequencyInfo=/false));
/UseMemorySSA=/false, /UseBlockFrequencyInfo=/false));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
OptimizePM.addPass(LoopDistributePass());		OptimizePM.addPass(LoopDistributePass());

// Populates the VFABI attribute with the scalar-to-vector mappings		// Populates the VFABI attribute with the scalar-to-vector mappings
▲ Show 20 Lines • Show All 629 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
	; CHECK-MATRIX: Running pass: LowerMatrixIntrinsicsPass on f			; CHECK-MATRIX: Running pass: LowerMatrixIntrinsicsPass on f
	; CHECK-MATRIX-NEXT: Running pass: EarlyCSEPass on f			; CHECK-MATRIX-NEXT: Running pass: EarlyCSEPass on f
	; CHECK-EP-VECTORIZER-START-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-VECTORIZER-START-NEXT: Running pass: NoOpFunctionPass
	; CHECK-EXT: Running pass: {{.*}}::Bye on foo			; CHECK-EXT: Running pass: {{.*}}::Bye on foo
	; CHECK-NOEXT: {{^}}			; CHECK-NOEXT: {{^}}
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
				; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-POSTLINK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-POSTLINK-O-NEXT: Running pass: Float2IntPass			; CHECK-POSTLINK-O-NEXT: Running pass: Float2IntPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-POSTLINK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass
				; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O-NEXT: Running pass: LCSSAPass on foo			; CHECK-O-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
				; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
				; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s --check-prefixes=ALL,O3		; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s --check-prefixes=ALL,O3
; RUN: opt -passes='default<O2>' -S < %s \| FileCheck %s --check-prefixes=ALL,O2		; RUN: opt -passes='default<O2>' -S < %s \| FileCheck %s --check-prefixes=ALL,O2
; RUN: opt -passes='default<O1>' -S < %s \| FileCheck %s --check-prefixes=ALL,O1		; RUN: opt -passes='default<O1>' -S < %s \| FileCheck %s --check-prefixes=ALL,O1

; All these tests should optimize to a single comparison		; All these tests should optimize to a single comparison
; of the original argument with null. There should be no loops.		; of the original argument with null. There should be no loops.

%struct.node = type { %struct.node*, i32 }		%struct.node = type { %struct.node*, i32 }

define dso_local zeroext i1 @is_not_empty_variant1(%struct.node* %p) {		define dso_local zeroext i1 @is_not_empty_variant1(%struct.node* %p) {
; ALL-LABEL: @is_not_empty_variant1(		; ALL-LABEL: @is_not_empty_variant1(
; ALL-NEXT: entry:		; ALL-NEXT: entry:
; ALL-NEXT: [[TOBOOL_NOT3_I:%.]] = icmp eq %struct.node [[P:%.*]], null		; ALL-NEXT: [[TOBOOL_NOT3_I:%.]] = icmp ne %struct.node [[P:%.*]], null
; ALL-NEXT: br i1 [[TOBOOL_NOT3_I]], label [[COUNT_NODES_VARIANT1_EXIT:%.]], label [[WHILE_BODY_I:%.]]		; ALL-NEXT: ret i1 [[TOBOOL_NOT3_I]]
; ALL: while.body.i:
; ALL-NEXT: [[P_ADDR_04_I:%.]] = phi %struct.node [ [[TMP0:%.]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY:%.]] ]
; ALL-NEXT: [[NEXT_I:%.]] = getelementptr inbounds [[STRUCT_NODE:%.]], %struct.node* [[P_ADDR_04_I]], i64 0, i32 0
; ALL-NEXT: [[TMP0]] = load %struct.node, %struct.node* [[NEXT_I]], align 8
; ALL-NEXT: [[TOBOOL_NOT_I:%.]] = icmp eq %struct.node [[TMP0]], null
; ALL-NEXT: br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT1_EXIT]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP0:![0-9]+]]
; ALL: count_nodes_variant1.exit:
; ALL-NEXT: [[TMP1:%.*]] = xor i1 [[TOBOOL_NOT3_I]], true
; ALL-NEXT: ret i1 [[TMP1]]
;		;
entry:		entry:
%p.addr = alloca %struct.node*, align 8		%p.addr = alloca %struct.node*, align 8
store %struct.node* %p, %struct.node** %p.addr, align 8		store %struct.node* %p, %struct.node** %p.addr, align 8
%0 = load %struct.node, %struct.node* %p.addr, align 8		%0 = load %struct.node, %struct.node* %p.addr, align 8
%call = call i32 @count_nodes_variant1(%struct.node* %0)		%call = call i32 @count_nodes_variant1(%struct.node* %0)
%cmp = icmp sgt i32 %call, 0		%cmp = icmp sgt i32 %call, 0
ret i1 %cmp		ret i1 %cmp
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	while.end:
%6 = load i64, i64* %size, align 8		%6 = load i64, i64* %size, align 8
%7 = bitcast i64* %size to i8*		%7 = bitcast i64* %size to i8*
ret i64 %6		ret i64 %6
}		}

define dso_local zeroext i1 @is_not_empty_variant3(%struct.node* %p) {		define dso_local zeroext i1 @is_not_empty_variant3(%struct.node* %p) {
; O3-LABEL: @is_not_empty_variant3(		; O3-LABEL: @is_not_empty_variant3(
; O3-NEXT: entry:		; O3-NEXT: entry:
; O3-NEXT: [[TOBOOL_NOT4_I:%.]] = icmp eq %struct.node [[P:%.*]], null		; O3-NEXT: [[TOBOOL_NOT4_I:%.]] = icmp ne %struct.node [[P:%.*]], null
; O3-NEXT: br i1 [[TOBOOL_NOT4_I]], label [[COUNT_NODES_VARIANT3_EXIT:%.]], label [[WHILE_BODY_I:%.]]		; O3-NEXT: ret i1 [[TOBOOL_NOT4_I]]
; O3: while.body.i:
; O3-NEXT: [[SIZE_06_I:%.]] = phi i64 [ [[INC_I:%.]], [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]
; O3-NEXT: [[P_ADDR_05_I:%.]] = phi %struct.node [ [[TMP0:%.*]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
; O3-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
; O3-NEXT: tail call void @llvm.assume(i1 [[CMP_I]]) #[[ATTR3:[0-9]+]]
; O3-NEXT: [[NEXT_I:%.]] = getelementptr inbounds [[STRUCT_NODE:%.]], %struct.node* [[P_ADDR_05_I]], i64 0, i32 0
; O3-NEXT: [[TMP0]] = load %struct.node, %struct.node* [[NEXT_I]], align 8
; O3-NEXT: [[INC_I]] = add nuw i64 [[SIZE_06_I]], 1
; O3-NEXT: [[TOBOOL_NOT_I:%.]] = icmp eq %struct.node [[TMP0]], null
; O3-NEXT: br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP2:![0-9]+]]
; O3: count_nodes_variant3.exit:
; O3-NEXT: [[TMP1:%.*]] = xor i1 [[TOBOOL_NOT4_I]], true
; O3-NEXT: ret i1 [[TMP1]]
;		;
; O2-LABEL: @is_not_empty_variant3(		; O2-LABEL: @is_not_empty_variant3(
; O2-NEXT: entry:		; O2-NEXT: entry:
; O2-NEXT: [[TOBOOL_NOT4_I:%.]] = icmp eq %struct.node [[P:%.*]], null		; O2-NEXT: [[TOBOOL_NOT4_I:%.]] = icmp ne %struct.node [[P:%.*]], null
; O2-NEXT: br i1 [[TOBOOL_NOT4_I]], label [[COUNT_NODES_VARIANT3_EXIT:%.]], label [[WHILE_BODY_I:%.]]		; O2-NEXT: ret i1 [[TOBOOL_NOT4_I]]
; O2: while.body.i:
; O2-NEXT: [[SIZE_06_I:%.]] = phi i64 [ [[INC_I:%.]], [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]
; O2-NEXT: [[P_ADDR_05_I:%.]] = phi %struct.node [ [[TMP0:%.*]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
; O2-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
; O2-NEXT: tail call void @llvm.assume(i1 [[CMP_I]]) #[[ATTR3:[0-9]+]]
; O2-NEXT: [[NEXT_I:%.]] = getelementptr inbounds [[STRUCT_NODE:%.]], %struct.node* [[P_ADDR_05_I]], i64 0, i32 0
; O2-NEXT: [[TMP0]] = load %struct.node, %struct.node* [[NEXT_I]], align 8
; O2-NEXT: [[INC_I]] = add nuw i64 [[SIZE_06_I]], 1
; O2-NEXT: [[TOBOOL_NOT_I:%.]] = icmp eq %struct.node [[TMP0]], null
; O2-NEXT: br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP2:![0-9]+]]
; O2: count_nodes_variant3.exit:
; O2-NEXT: [[TMP1:%.*]] = xor i1 [[TOBOOL_NOT4_I]], true
; O2-NEXT: ret i1 [[TMP1]]
;		;
; O1-LABEL: @is_not_empty_variant3(		; O1-LABEL: @is_not_empty_variant3(
; O1-NEXT: entry:		; O1-NEXT: entry:
; O1-NEXT: [[TOBOOL_NOT4_I:%.]] = icmp eq %struct.node [[P:%.*]], null		; O1-NEXT: [[TOBOOL_NOT4_I:%.]] = icmp eq %struct.node [[P:%.*]], null
; O1-NEXT: br i1 [[TOBOOL_NOT4_I]], label [[COUNT_NODES_VARIANT3_EXIT:%.]], label [[WHILE_BODY_I:%.]]		; O1-NEXT: br i1 [[TOBOOL_NOT4_I]], label [[COUNT_NODES_VARIANT3_EXIT:%.]], label [[WHILE_BODY_I:%.]]
; O1: while.body.i:		; O1: while.body.i:
; O1-NEXT: [[SIZE_06_I:%.]] = phi i64 [ [[INC_I:%.]], [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]		; O1-NEXT: [[SIZE_06_I:%.]] = phi i64 [ [[INC_I:%.]], [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]
; O1-NEXT: [[P_ADDR_05_I:%.]] = phi %struct.node [ [[TMP0:%.*]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]		; O1-NEXT: [[P_ADDR_05_I:%.]] = phi %struct.node [ [[TMP0:%.*]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
; O1-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1		; O1-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
; O1-NEXT: call void @llvm.assume(i1 [[CMP_I]]) #[[ATTR3:[0-9]+]]		; O1-NEXT: call void @llvm.assume(i1 [[CMP_I]]) #[[ATTR3:[0-9]+]]
; O1-NEXT: [[NEXT_I:%.]] = getelementptr inbounds [[STRUCT_NODE:%.]], %struct.node* [[P_ADDR_05_I]], i64 0, i32 0		; O1-NEXT: [[NEXT_I:%.]] = getelementptr inbounds [[STRUCT_NODE:%.]], %struct.node* [[P_ADDR_05_I]], i64 0, i32 0
; O1-NEXT: [[TMP0]] = load %struct.node, %struct.node* [[NEXT_I]], align 8		; O1-NEXT: [[TMP0]] = load %struct.node, %struct.node* [[NEXT_I]], align 8
; O1-NEXT: [[INC_I]] = add i64 [[SIZE_06_I]], 1		; O1-NEXT: [[INC_I]] = add i64 [[SIZE_06_I]], 1
; O1-NEXT: [[TOBOOL_NOT_I:%.]] = icmp eq %struct.node [[TMP0]], null		; O1-NEXT: [[TOBOOL_NOT_I:%.]] = icmp eq %struct.node [[TMP0]], null
; O1-NEXT: br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT:%.*]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP2:![0-9]+]]		; O1-NEXT: br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT:%.*]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP0:![0-9]+]]
; O1: count_nodes_variant3.exit.loopexit:		; O1: count_nodes_variant3.exit.loopexit:
; O1-NEXT: [[PHI_CMP:%.*]] = icmp ne i64 [[INC_I]], 0		; O1-NEXT: [[PHI_CMP:%.*]] = icmp ne i64 [[INC_I]], 0
; O1-NEXT: br label [[COUNT_NODES_VARIANT3_EXIT]]		; O1-NEXT: br label [[COUNT_NODES_VARIANT3_EXIT]]
; O1: count_nodes_variant3.exit:		; O1: count_nodes_variant3.exit:
; O1-NEXT: [[SIZE_0_LCSSA_I:%.*]] = phi i1 [ false, [[ENTRY]] ], [ [[PHI_CMP]], [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT]] ]		; O1-NEXT: [[SIZE_0_LCSSA_I:%.*]] = phi i1 [ false, [[ENTRY]] ], [ [[PHI_CMP]], [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT]] ]
; O1-NEXT: ret i1 [[SIZE_0_LCSSA_I]]		; O1-NEXT: ret i1 [[SIZE_0_LCSSA_I]]
;		;
entry:		entry:
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 384487

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll

[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes
ClosedPublic