This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Passes/
-
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
opt-pipeline.ll
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
opt-O2-pipeline.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
opt-pipeline-vector-passes.ll
-
Transforms/PhaseOrdering/X86/
-
PhaseOrdering/
-
X86/
-
addsub.ll
-
horiz-math.ll
2/2
vector-reductions.ll

Differential D80236

[VectorCombine] position pass after SLP in the optimization pipeline rather than before
ClosedPublic

Authored by spatel on May 19 2020, 1:26 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
junparser
RKSimon

Commits

rG6438ea45e053: [VectorCombine] position pass after SLP in the optimization pipeline rather…

Summary

There are 2 known problem patterns shown in the test diffs here: vector horizontal ops (an x86 specialization) and vector reductions.
SLP has greater ability to match and fold those than vector-combine, so let SLP have first chance at that.
This is a quick fix while we continue to improve vector-combine and possibly canonicalize to reduction intrinsics.
In the longer term, we should improve matching of these patterns because if they were created in the "bad" forms shown here, then we would miss optimizing them.

I'm not sure what is happening with alias analysis on the addsub test. The old pass manager now shows an extra line for that, and we see an improvement that comes from SLP vectorizing a store. I don't know what's missing with the new pass manager to make that happen. Strangely, I can't reproduce the behavior if I compile from C++ with clang and invoke the new PM with "-fexperimental-new-pass-manager".

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 19 2020, 1:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 1:26 PM

Herald added subscribers: dexonsmith, steven_wu, hiraditya, mcrosier. · View Herald Transcript

spatel mentioned this in D79078: [VectorCombine] Leave reduction operation to SLP.May 19 2020, 1:33 PM

LGTM

This revision is now accepted and ready to land.May 21 2020, 6:53 PM

RKSimon added inline comments.May 22 2020, 4:59 AM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
7–8	update comment?

spatel marked 2 inline comments as done.May 22 2020, 9:13 AM

spatel added inline comments.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
7–8	Right - will update on push.

spatel marked an inline comment as done.May 22 2020, 9:21 AM

Closed by commit rG6438ea45e053: [VectorCombine] position pass after SLP in the optimization pipeline rather… (authored by spatel). · Explain WhyMay 22 2020, 9:39 AM

This revision was automatically updated to reflect the committed changes.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptMay 22 2020, 9:39 AM

This changes caused a 0.25% compile-time regression. Looking at the pipeline test changes, this is probably because you do not preserve AAResultsWrapperPass inside VectorCombine.

spatel mentioned this in rG024098ae5349: [VectorCombine] set preserve alias analysis.May 22 2020, 1:26 PM

In D80236#2051460, @nikic wrote:

This changes caused a 0.25% compile-time regression. Looking at the pipeline test changes, this is probably because you do not preserve AAResultsWrapperPass inside VectorCombine.

Aha...thanks for pointing out the fix:
rG024098ae5349

There's still something wrong with alias analysis in the pipeline because that addsub test is getting folded in the old PM, but not the new PM.

@spatel Thanks for the quick fix! Unfortunately, this seems to be only part of the story. Preserving AA improved things, but only a bit.

As I don't see anything else here, this probably affects the amount of generated IR, though the final code-size changes aren't particularly large. So maybe nothing to be done.

It turns out that the main impact of this change (both in terms of compile-time, and in terms of text size changes) is not the move of the VectorCombine pass, but the move of the EarlyCSE pass. If I leave EarlyCSE where it is and only move VectorCombine, the results is essentially noise. (Which doesn't tell us anything about which EarlyCSE placement produces better code...)

In D80236#2051972, @nikic wrote:

It turns out that the main impact of this change (both in terms of compile-time, and in terms of text size changes) is not the move of the VectorCombine pass, but the move of the EarlyCSE pass. If I leave EarlyCSE where it is and only move VectorCombine, the results is essentially noise. (Which doesn't tell us anything about which EarlyCSE placement produces better code...)

I added the EarlyCSE cleanup as part of D75145. And it was noted there as causing perf regressions for ARM code (cc @dmgreen), so it was not ideal from the start.

I just commented out EarlyCSE locally from the pipeline, and we no longer need that on the motivating test from D75145. (Not exactly sure why - several things changed since then.)

But there is a regression with the new pass manager on the "add_aggregate_store" test that is changing here. Do you know why alias analysis is not working the same on that test for new and old PMs?

If we can fix that, then we can remove EarlyCSE from the pipeline with no immediate regressions (getting full test-suite/other results that agree with that would be ideal, but we could push the change and wait for fallout).

@spatel Apparently you need to explicitly specify -aa-pipeline=default to get a default AA pipeline.

spatel mentioned this in rGd43fac052e16: [PhaseOrdering] adjust test to use default alias analysis with new pass manager….May 24 2020, 8:32 AM

spatel mentioned this in rG57bb4787d72f: [Pass Manager] remove EarlyCSE as clean-up for VectorCombine.May 24 2020, 9:37 AM

In D80236#2052210, @nikic wrote:

@spatel Apparently you need to explicitly specify -aa-pipeline=default to get a default AA pipeline.

Thanks for the hint! I added that to the regression test, so that lets us remove the extra -early-cse from the pipeline:
rG57bb4787d72f

Hopefully, that improves compile-time as expected.

In D80236#2052034, @spatel wrote:

I added the EarlyCSE cleanup as part of D75145. And it was noted there as causing perf regressions for ARM code (cc @dmgreen), so it was not ideal from the start.

Yeah, We did see some performance changes from this patch, and will do from 57bb4787d72f again I think. They were much larger for when we were running under lto than without. We are currently in the process of moving those benchmarks over to not run under lto, so I happened to have both. Without LTO the changes were all just +-2%, so not anything to worry about. It suggests something funny might be going on with LTO where we run the entire pass pipeline twice.

It feels quite common in the pass pipeline to want to write:

MPM.add(createLowerMatrixIntrinsicsPass());
MPM.add(createEarlyCSEPass(false)); // cleanup

(That's just an example I noticed. I've wanted similar things for loop unrolling in the past.) It's a shame to run the cleanup passes on all the code you compile, just because one fairly rare thing needed it. Perhaps it would make sense to make some of these things utilities as opposed to running them as passes, in order to target them at the code we know has changed and would benefit from it.

Perhaps it would make sense to make some of these things utilities as opposed to running them as passes, in order to target them at the code we know has changed and would benefit from it.

Just for info, downstream this caused some benchmark regressions for out OOT target.

Here is my analysis:

We've got some additions in the LoopVectorizer that inserts some more guards for taking the vector/scalar path.
Those branch conditions might be loop invariant, and might look the same for several loops in a nest.
When running EarlyCSE after LoopVectorizer (and before CFGSimplification) those branch conditions are CSE:d, so we use the same condition in branches, not the same subexpressions.
I got big diffs after CFGSimplification depending on if I remove EarlyCSE or not. And I guess the CFG simplification benefits from seeing that branches are using the same condition (it depends on code being CSE:d rather than comparing subexpressions).

I can mention that we use ExtraVectorizerPasses in that test, which runs LICM/CFGSimplification etc an extra time before SLP. And I haven't checked if we got the same problem with the regular CFGSimplification.

I'll probably just add an extra run of EarlyCSE among the ExtraVectorizerPasses to solve this downstream.

Given that our downstream additions in LoopVectorizer seem to be involved somehow, this could be more common scenario for us compared to the upstream code base. We could probably handcraft some lit-test to show this, but I guess that wouldn't justify adding back EarlyCSE in the pipe, if it isn't seen in benchmarks being executed on the upstream code base.

Just wanted to let you know about a potential scenario where EarlyCSE makes a difference here.

Thanks for the feedback. I filed https://bugs.llvm.org/show_bug.cgi?id=46065 for making a CSE utility.

@bjope Tracing back the changes here, what happened is that originally there was an EarlyCSE run in ExtraVectorizerPasses, then it got moved into the main pipeline as part of the VectorCombine introduction, then it got moved after SLP in this patch and then it got dropped entirely afterwards. So the EarlyCSE run that was present in ExtraVectorizerPasses is now gone as a side-effect of this shuffling around. So I'd say, feel free to just add it back (it was the first pass in the https://github.com/llvm/llvm-project/blob/2dc664d578f0e9c8ea5975eed745e322fa77bffe/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp#L734 block) and restore the status quo of LLVM 10.

In D80236#2091336, @nikic wrote:

@bjope Tracing back the changes here, what happened is that originally there was an EarlyCSE run in ExtraVectorizerPasses, then it got moved into the main pipeline as part of the VectorCombine introduction, then it got moved after SLP in this patch and then it got dropped entirely afterwards. So the EarlyCSE run that was present in ExtraVectorizerPasses is now gone as a side-effect of this shuffling around. So I'd say, feel free to just add it back (it was the first pass in the https://github.com/llvm/llvm-project/blob/2dc664d578f0e9c8ea5975eed745e322fa77bffe/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp#L734 block) and restore the status quo of LLVM 10.

Thanks for tracking that down! Hopefully back to LLVM10 state with:
rG098e48a6a15

spatel mentioned this in rG098e48a6a155: [PassManager] restore early-cse to vector cleanup.Jun 14 2020, 7:29 AM

In D80236#2091847, @spatel wrote:

In D80236#2091336, @nikic wrote:

@bjope Tracing back the changes here, what happened is that originally there was an EarlyCSE run in ExtraVectorizerPasses, then it got moved into the main pipeline as part of the VectorCombine introduction, then it got moved after SLP in this patch and then it got dropped entirely afterwards. So the EarlyCSE run that was present in ExtraVectorizerPasses is now gone as a side-effect of this shuffling around. So I'd say, feel free to just add it back (it was the first pass in the https://github.com/llvm/llvm-project/blob/2dc664d578f0e9c8ea5975eed745e322fa77bffe/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp#L734 block) and restore the status quo of LLVM 10.

Thanks for tracking that down! Hopefully back to LLVM10 state with:
rG098e48a6a15

Thanks! Both for the detective work and the patch.

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilder.cpp

7 lines

Transforms/

IPO/

PassManagerBuilder.cpp

6 lines

test/

CodeGen/

AMDGPU/

opt-pipeline.ll

14 lines

Other/

new-pm-defaults.ll

4 lines

new-pm-thinlto-defaults.ll

4 lines

new-pm-thinlto-postlink-pgo-defaults.ll

4 lines

new-pm-thinlto-postlink-samplepgo-defaults.ll

4 lines

opt-O2-pipeline.ll

5 lines

opt-O3-pipeline.ll

5 lines

opt-Os-pipeline.ll

5 lines

opt-pipeline-vector-passes.ll

12 lines

Transforms/

PhaseOrdering/

X86/

addsub.ll

44 lines

horiz-math.ll

33 lines

vector-reductions.ll

16 lines

Diff 265758

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 980 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildModuleOptimizationPipeline(
// Populates the VFABI attribute with the scalar-to-vector mappings		// Populates the VFABI attribute with the scalar-to-vector mappings
// from the TargetLibraryInfo.		// from the TargetLibraryInfo.
OptimizePM.addPass(InjectTLIMappings());		OptimizePM.addPass(InjectTLIMappings());

// Now run the core loop vectorizer.		// Now run the core loop vectorizer.
OptimizePM.addPass(LoopVectorizePass(		OptimizePM.addPass(LoopVectorizePass(
LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));		LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

// Enhance/cleanup vector code.
OptimizePM.addPass(VectorCombinePass());
OptimizePM.addPass(EarlyCSEPass());

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
OptimizePM.addPass(LoopLoadEliminationPass());		OptimizePM.addPass(LoopLoadEliminationPass());

// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

// Now that we've formed fast to execute loop structures, we do further		// Now that we've formed fast to execute loop structures, we do further
Show All 10 Lines	OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().
convertSwitchToLookupTable(true).		convertSwitchToLookupTable(true).
needCanonicalLoops(false).		needCanonicalLoops(false).
sinkCommonInsts(true)));		sinkCommonInsts(true)));

// Optimize parallel scalar instruction chains into SIMD instructions.		// Optimize parallel scalar instruction chains into SIMD instructions.
if (PTO.SLPVectorization)		if (PTO.SLPVectorization)
OptimizePM.addPass(SLPVectorizerPass());		OptimizePM.addPass(SLPVectorizerPass());

		// Enhance/cleanup vector code.
		OptimizePM.addPass(VectorCombinePass());
		OptimizePM.addPass(EarlyCSEPass());
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

// Unroll small loops to hide loop backedge latency and saturate any parallel		// Unroll small loops to hide loop backedge latency and saturate any parallel
// execution resources of an out-of-order processor. We also then need to		// execution resources of an out-of-order processor. We also then need to
// clean up redundancies and loop invariant code.		// clean up redundancies and loop invariant code.
// FIXME: It would be really good to use a loop-integrated instruction		// FIXME: It would be really good to use a loop-integrated instruction
// combiner for cleanup here so that the unrolling and LICM can be pipelined		// combiner for cleanup here so that the unrolling and LICM can be pipelined
// across the loop nests.		// across the loop nests.
▲ Show 20 Lines • Show All 1,487 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 735 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
MPM.add(createLoopDistributePass());		MPM.add(createLoopDistributePass());

MPM.add(createLoopVectorizePass(!LoopsInterleaved, !LoopVectorize));		MPM.add(createLoopVectorizePass(!LoopsInterleaved, !LoopVectorize));
MPM.add(createVectorCombinePass());
MPM.add(createEarlyCSEPass());

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
MPM.add(createLoopLoadEliminationPass());		MPM.add(createLoopLoadEliminationPass());

// FIXME: Because of #pragma vectorize enable, the passes below are always		// FIXME: Because of #pragma vectorize enable, the passes below are always
// inserted in the pipeline, even when the vectorizer doesn't run (ex. when		// inserted in the pipeline, even when the vectorizer doesn't run (ex. when
// on -O1 and no #pragma is found). Would be good to have these two passes		// on -O1 and no #pragma is found). Would be good to have these two passes
Show All 24 Lines	void PassManagerBuilder::populateModulePassManager(

if (SLPVectorize) {		if (SLPVectorize) {
MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
if (OptLevel > 1 && ExtraVectorizerPasses) {		if (OptLevel > 1 && ExtraVectorizerPasses) {
MPM.add(createEarlyCSEPass());		MPM.add(createEarlyCSEPass());
}		}
}		}

		// Enhance/cleanup vector code.
		MPM.add(createVectorCombinePass());
		MPM.add(createEarlyCSEPass());

addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());

if (EnableUnrollAndJam && !DisableUnrollLoops) {		if (EnableUnrollAndJam && !DisableUnrollLoops) {
// Unroll and Jam. We do this before unroll but need to be in a separate		// Unroll and Jam. We do this before unroll but need to be in a separate
// loop pass manager in order for the outer loop to be processed by		// loop pass manager in order for the outer loop to be processed by
// unroll and jam before the inner loop is unrolled.		// unroll and jam before the inner loop is unrolled.
MPM.add(createLoopUnrollAndJamPass(OptLevel));		MPM.add(createLoopUnrollAndJamPass(OptLevel));
▲ Show 20 Lines • Show All 421 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Loop Access Analysis			; GCN-O1-NEXT: Loop Access Analysis
	; GCN-O1-NEXT: Demanded bits analysis			; GCN-O1-NEXT: Demanded bits analysis
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Optimization Remark Emitter			; GCN-O1-NEXT: Optimization Remark Emitter
	; GCN-O1-NEXT: Inject TLI Mappings			; GCN-O1-NEXT: Inject TLI Mappings
	; GCN-O1-NEXT: Loop Vectorization			; GCN-O1-NEXT: Loop Vectorization
	; GCN-O1-NEXT: Optimize scalar/vector ops
	; GCN-O1-NEXT: Early CSE
	; GCN-O1-NEXT: Canonicalize natural loops			; GCN-O1-NEXT: Canonicalize natural loops
	; GCN-O1-NEXT: Scalar Evolution Analysis			; GCN-O1-NEXT: Scalar Evolution Analysis
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Loop Access Analysis			; GCN-O1-NEXT: Loop Access Analysis
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Loop Load Elimination			; GCN-O1-NEXT: Loop Load Elimination
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Optimization Remark Emitter			; GCN-O1-NEXT: Optimization Remark Emitter
	; GCN-O1-NEXT: Combine redundant instructions			; GCN-O1-NEXT: Combine redundant instructions
	; GCN-O1-NEXT: Simplify the CFG			; GCN-O1-NEXT: Simplify the CFG
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
				; GCN-O1-NEXT: Optimize scalar/vector ops
				; GCN-O1-NEXT: Early CSE
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Optimization Remark Emitter			; GCN-O1-NEXT: Optimization Remark Emitter
	; GCN-O1-NEXT: Combine redundant instructions			; GCN-O1-NEXT: Combine redundant instructions
	; GCN-O1-NEXT: Canonicalize natural loops			; GCN-O1-NEXT: Canonicalize natural loops
	▲ Show 20 Lines • Show All 308 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Loop Access Analysis			; GCN-O2-NEXT: Loop Access Analysis
	; GCN-O2-NEXT: Demanded bits analysis			; GCN-O2-NEXT: Demanded bits analysis
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Optimization Remark Emitter			; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Inject TLI Mappings			; GCN-O2-NEXT: Inject TLI Mappings
	; GCN-O2-NEXT: Loop Vectorization			; GCN-O2-NEXT: Loop Vectorization
	; GCN-O2-NEXT: Optimize scalar/vector ops
	; GCN-O2-NEXT: Early CSE
	; GCN-O2-NEXT: Canonicalize natural loops			; GCN-O2-NEXT: Canonicalize natural loops
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Loop Access Analysis			; GCN-O2-NEXT: Loop Access Analysis
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Loop Load Elimination			; GCN-O2-NEXT: Loop Load Elimination
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	Show All 9 Lines
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Demanded bits analysis			; GCN-O2-NEXT: Demanded bits analysis
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Optimization Remark Emitter			; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Inject TLI Mappings			; GCN-O2-NEXT: Inject TLI Mappings
	; GCN-O2-NEXT: SLP Vectorizer			; GCN-O2-NEXT: SLP Vectorizer
				; GCN-O2-NEXT: Optimize scalar/vector ops
				; GCN-O2-NEXT: Early CSE
				; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Optimization Remark Emitter			; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Combine redundant instructions			; GCN-O2-NEXT: Combine redundant instructions
	; GCN-O2-NEXT: Canonicalize natural loops			; GCN-O2-NEXT: Canonicalize natural loops
	; GCN-O2-NEXT: LCSSA Verifier			; GCN-O2-NEXT: LCSSA Verifier
	; GCN-O2-NEXT: Loop-Closed SSA Form Pass			; GCN-O2-NEXT: Loop-Closed SSA Form Pass
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
	; GCN-O2-NEXT: Unroll loops			; GCN-O2-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Loop Access Analysis			; GCN-O3-NEXT: Loop Access Analysis
	; GCN-O3-NEXT: Demanded bits analysis			; GCN-O3-NEXT: Demanded bits analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Inject TLI Mappings			; GCN-O3-NEXT: Inject TLI Mappings
	; GCN-O3-NEXT: Loop Vectorization			; GCN-O3-NEXT: Loop Vectorization
	; GCN-O3-NEXT: Optimize scalar/vector ops
	; GCN-O3-NEXT: Early CSE
	; GCN-O3-NEXT: Canonicalize natural loops			; GCN-O3-NEXT: Canonicalize natural loops
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Loop Access Analysis			; GCN-O3-NEXT: Loop Access Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Loop Load Elimination			; GCN-O3-NEXT: Loop Load Elimination
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	Show All 9 Lines
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Demanded bits analysis			; GCN-O3-NEXT: Demanded bits analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Inject TLI Mappings			; GCN-O3-NEXT: Inject TLI Mappings
	; GCN-O3-NEXT: SLP Vectorizer			; GCN-O3-NEXT: SLP Vectorizer
				; GCN-O3-NEXT: Optimize scalar/vector ops
				; GCN-O3-NEXT: Early CSE
				; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Combine redundant instructions			; GCN-O3-NEXT: Combine redundant instructions
	; GCN-O3-NEXT: Canonicalize natural loops			; GCN-O3-NEXT: Canonicalize natural loops
	; GCN-O3-NEXT: LCSSA Verifier			; GCN-O3-NEXT: LCSSA Verifier
	; GCN-O3-NEXT: Loop-Closed SSA Form Pass			; GCN-O3-NEXT: Loop-Closed SSA Form Pass
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
	; GCN-O3-NEXT: Unroll loops			; GCN-O3-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-POSTLINK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
				; CHECK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: Loop Vectorization			; CHECK-NEXT: Loop Vectorization
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	Show All 9 Lines
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Early CSE
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: Loop Vectorization			; CHECK-NEXT: Loop Vectorization
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	Show All 9 Lines
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Early CSE
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: Loop Vectorization			; CHECK-NEXT: Loop Vectorization
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	Show All 9 Lines
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Early CSE
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Other/opt-pipeline-vector-passes.ll

	; RUN: opt -O1 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O1			; RUN: opt -O1 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O1
	; RUN: opt -O2 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O2			; RUN: opt -O2 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O2
	; RUN: opt -O1 -vectorize-loops=0 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O1_FORCE_OFF			; RUN: opt -O1 -vectorize-loops=0 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O1_FORCE_OFF
	; RUN: opt -O2 -vectorize-loops=0 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O2_FORCE_OFF			; RUN: opt -O2 -vectorize-loops=0 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck %s --check-prefixes=OLDPM_O2_FORCE_OFF
	; RUN: opt -disable-verify -debug-pass-manager -passes='default<O1>' -S %s 2>&1 \| FileCheck %s --check-prefixes=NEWPM_O1			; RUN: opt -disable-verify -debug-pass-manager -passes='default<O1>' -S %s 2>&1 \| FileCheck %s --check-prefixes=NEWPM_O1
	; RUN: opt -disable-verify -debug-pass-manager -passes='default<O2>' -S %s 2>&1 \| FileCheck %s --check-prefixes=NEWPM_O2			; RUN: opt -disable-verify -debug-pass-manager -passes='default<O2>' -S %s 2>&1 \| FileCheck %s --check-prefixes=NEWPM_O2

	; REQUIRES: asserts			; REQUIRES: asserts

	; SLP does not run at -O1. Loop vectorization runs, but it only			; SLP does not run at -O1. Loop vectorization runs, but it only
	; works on loops explicitly annotated with pragmas.			; works on loops explicitly annotated with pragmas.

	; OLDPM_O1-LABEL: Pass Arguments:			; OLDPM_O1-LABEL: Pass Arguments:
	; OLDPM_O1: Loop Vectorization			; OLDPM_O1: Loop Vectorization
	; OLDPM_O1: Optimize scalar/vector ops
	; OLDPM_O1-NOT: SLP Vectorizer			; OLDPM_O1-NOT: SLP Vectorizer
				; OLDPM_O1: Optimize scalar/vector ops

	; Everything runs at -O2.			; Everything runs at -O2.

	; OLDPM_O2-LABEL: Pass Arguments:			; OLDPM_O2-LABEL: Pass Arguments:
	; OLDPM_O2: Loop Vectorization			; OLDPM_O2: Loop Vectorization
	; OLDPM_O2: Optimize scalar/vector ops
	; OLDPM_O2: SLP Vectorizer			; OLDPM_O2: SLP Vectorizer
				; OLDPM_O2: Optimize scalar/vector ops

	; The loop vectorizer still runs at both -O1/-O2 even with the			; The loop vectorizer still runs at both -O1/-O2 even with the
	; debug flag, but it only works on loops explicitly annotated			; debug flag, but it only works on loops explicitly annotated
	; with pragmas.			; with pragmas.

	; OLDPM_O1_FORCE_OFF-LABEL: Pass Arguments:			; OLDPM_O1_FORCE_OFF-LABEL: Pass Arguments:
	; OLDPM_O1_FORCE_OFF: Loop Vectorization			; OLDPM_O1_FORCE_OFF: Loop Vectorization
	; OLDPM_O1_FORCE_OFF: Optimize scalar/vector ops
	; OLDPM_O1_FORCE_OFF-NOT: SLP Vectorizer			; OLDPM_O1_FORCE_OFF-NOT: SLP Vectorizer
				; OLDPM_O1_FORCE_OFF: Optimize scalar/vector ops

	; OLDPM_O2_FORCE_OFF-LABEL: Pass Arguments:			; OLDPM_O2_FORCE_OFF-LABEL: Pass Arguments:
	; OLDPM_O2_FORCE_OFF: Loop Vectorization			; OLDPM_O2_FORCE_OFF: Loop Vectorization
	; OLDPM_O2_FORCE_OFF: Optimize scalar/vector ops
	; OLDPM_O2_FORCE_OFF: SLP Vectorizer			; OLDPM_O2_FORCE_OFF: SLP Vectorizer
				; OLDPM_O2_FORCE_OFF: Optimize scalar/vector ops

	; There should be no difference with the new pass manager.			; There should be no difference with the new pass manager.
	; This is tested more thoroughly in other test files.			; This is tested more thoroughly in other test files.

	; NEWPM_O1-LABEL: Running pass: LoopVectorizePass			; NEWPM_O1-LABEL: Running pass: LoopVectorizePass
	; NEWPM_O1: Running pass: VectorCombinePass
	; NEWPM_O1-NOT: Running pass: SLPVectorizerPass			; NEWPM_O1-NOT: Running pass: SLPVectorizerPass
				; NEWPM_O1: Running pass: VectorCombinePass

	; NEWPM_O2-LABEL: Running pass: LoopVectorizePass			; NEWPM_O2-LABEL: Running pass: LoopVectorizePass
	; NEWPM_O2: Running pass: VectorCombinePass
	; NEWPM_O2: Running pass: SLPVectorizerPass			; NEWPM_O2: Running pass: SLPVectorizerPass
				; NEWPM_O2: Running pass: VectorCombinePass

	define void @f() {			define void @f() {
	ret void			ret void
	}			}

llvm/test/Transforms/PhaseOrdering/X86/addsub.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -O3 -S \| FileCheck %s		; RUN: opt < %s -O3 -S \| FileCheck %s --check-prefixes=CHECK,OLDPM
; RUN: opt < %s -passes='default<O3>' -S \| FileCheck %s		; RUN: opt < %s -passes='default<O3>' -S \| FileCheck %s --check-prefixes=CHECK,NEWPM

target triple = "x86_64--"		target triple = "x86_64--"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

; Ideally, this should reach the backend with 1 fsub, 1 fadd, and 1 shuffle.		; Ideally, this should reach the backend with 1 fsub, 1 fadd, and 1 shuffle.
; That may require some coordination between VectorCombine, SLP, and other passes.		; That may require some coordination between VectorCombine, SLP, and other passes.
; The end goal is to get a single "vaddsubps" instruction for x86 with AVX.		; The end goal is to get a single "vaddsubps" instruction for x86 with AVX.

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	;
%add10 = fadd float %a11, %b11		%add10 = fadd float %a11, %b11
%retval.1.1.insert = insertelement <2 x float> %retval.1.0.insert, float %add10, i32 1		%retval.1.1.insert = insertelement <2 x float> %retval.1.0.insert, float %add10, i32 1
%fca.0.insert = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> %retval.0.1.insert, 0		%fca.0.insert = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> %retval.0.1.insert, 0
%fca.1.insert = insertvalue { <2 x float>, <2 x float> } %fca.0.insert, <2 x float> %retval.1.1.insert, 1		%fca.1.insert = insertvalue { <2 x float>, <2 x float> } %fca.0.insert, <2 x float> %retval.1.1.insert, 1
ret { <2 x float>, <2 x float> } %fca.1.insert		ret { <2 x float>, <2 x float> } %fca.1.insert
}		}

define void @add_aggregate_store(<2 x float> %a0, <2 x float> %a1, <2 x float> %b0, <2 x float> %b1, %struct.Vector4* nocapture dereferenceable(16) %r) {		define void @add_aggregate_store(<2 x float> %a0, <2 x float> %a1, <2 x float> %b0, <2 x float> %b1, %struct.Vector4* nocapture dereferenceable(16) %r) {
; CHECK-LABEL: @add_aggregate_store(		; OLDPM-LABEL: @add_aggregate_store(
; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]		; OLDPM-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0		; OLDPM-NEXT: [[TMP2:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]
; CHECK-NEXT: [[R0:%.]] = getelementptr inbounds [[STRUCT_VECTOR4:%.]], %struct.Vector4* [[R:%.*]], i64 0, i32 0		; OLDPM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: store float [[TMP2]], float* [[R0]], align 4		; OLDPM-NEXT: [[TMP4:%.]] = bitcast %struct.Vector4 [[R:%.]] to <4 x float>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1		; OLDPM-NEXT: store <4 x float> [[TMP3]], <4 x float>* [[TMP4]], align 4
; CHECK-NEXT: [[R1:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 1		; OLDPM-NEXT: ret void
; CHECK-NEXT: store float [[TMP3]], float* [[R1]], align 4		;
; CHECK-NEXT: [[TMP4:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]		; NEWPM-LABEL: @add_aggregate_store(
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0		; NEWPM-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]
; CHECK-NEXT: [[R2:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 2		; NEWPM-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
; CHECK-NEXT: store float [[TMP5]], float* [[R2]], align 4		; NEWPM-NEXT: [[R0:%.]] = getelementptr inbounds [[STRUCT_VECTOR4:%.]], %struct.Vector4* [[R:%.*]], i64 0, i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1		; NEWPM-NEXT: store float [[TMP2]], float* [[R0]], align 4
; CHECK-NEXT: [[R3:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 3		; NEWPM-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
; CHECK-NEXT: store float [[TMP6]], float* [[R3]], align 4		; NEWPM-NEXT: [[R1:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 1
; CHECK-NEXT: ret void		; NEWPM-NEXT: store float [[TMP3]], float* [[R1]], align 4
		; NEWPM-NEXT: [[TMP4:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]
		; NEWPM-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
		; NEWPM-NEXT: [[R2:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 2
		; NEWPM-NEXT: store float [[TMP5]], float* [[R2]], align 4
		; NEWPM-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
		; NEWPM-NEXT: [[R3:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 3
		; NEWPM-NEXT: store float [[TMP6]], float* [[R3]], align 4
		; NEWPM-NEXT: ret void
;		;
%a00 = extractelement <2 x float> %a0, i32 0		%a00 = extractelement <2 x float> %a0, i32 0
%b00 = extractelement <2 x float> %b0, i32 0		%b00 = extractelement <2 x float> %b0, i32 0
%add = fadd float %a00, %b00		%add = fadd float %a00, %b00
%r0 = getelementptr inbounds %struct.Vector4, %struct.Vector4* %r, i64 0, i32 0		%r0 = getelementptr inbounds %struct.Vector4, %struct.Vector4* %r, i64 0, i32 0
store float %add, float* %r0, align 4		store float %add, float* %r0, align 4
%a01 = extractelement <2 x float> %a0, i32 1		%a01 = extractelement <2 x float> %a0, i32 1
%b01 = extractelement <2 x float> %b0, i32 1		%b01 = extractelement <2 x float> %b0, i32 1
Show All 15 Lines

llvm/test/Transforms/PhaseOrdering/X86/horiz-math.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O3 -S < %s \| FileCheck %s			; RUN: opt -O3 -S < %s \| FileCheck %s
	; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s			; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s

	target triple = "x86_64--"			target triple = "x86_64--"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; PR41813 - https://bugs.llvm.org/show_bug.cgi?id=41813			; PR41813 - https://bugs.llvm.org/show_bug.cgi?id=41813

	define <4 x float> @hadd_reverse_v4f32(<4 x float> %a, <4 x float> %b) #0 {			define <4 x float> @hadd_reverse_v4f32(<4 x float> %a, <4 x float> %b) #0 {
	; CHECK-LABEL: @hadd_reverse_v4f32(			; CHECK-LABEL: @hadd_reverse_v4f32(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <4 x i32> <i32 3, i32 1, i32 7, i32 5>
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[TMP1]], [[A]]			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <4 x i32> <i32 2, i32 0, i32 6, i32 4>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[TMP3]], [[A]]			; CHECK-NEXT: ret <4 x float> [[TMP3]]
	; CHECK-NEXT: [[VECINIT6:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP4]], <4 x i32> <i32 2, i32 4, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP5:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP5]], [[B]]
	; CHECK-NEXT: [[VECINIT10:%.*]] = shufflevector <4 x float> [[VECINIT6]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 6, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[B]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], [[B]]
	; CHECK-NEXT: [[VECINIT14:%.*]] = shufflevector <4 x float> [[VECINIT10]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>
	; CHECK-NEXT: ret <4 x float> [[VECINIT14]]
	;			;
	%shuffle = shufflevector <4 x float> %a, <4 x float> %a, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			%shuffle = shufflevector <4 x float> %a, <4 x float> %a, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	%shuffle1 = shufflevector <4 x float> %b, <4 x float> %b, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			%shuffle1 = shufflevector <4 x float> %b, <4 x float> %b, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	%vecext = extractelement <4 x float> %shuffle, i32 0			%vecext = extractelement <4 x float> %shuffle, i32 0
	%vecext2 = extractelement <4 x float> %shuffle, i32 1			%vecext2 = extractelement <4 x float> %shuffle, i32 1
	%add = fadd float %vecext, %vecext2			%add = fadd float %vecext, %vecext2
	%vecinit = insertelement <4 x float> undef, float %add, i32 0			%vecinit = insertelement <4 x float> undef, float %add, i32 0
	%vecext3 = extractelement <4 x float> %shuffle, i32 2			%vecext3 = extractelement <4 x float> %shuffle, i32 2
	%vecext4 = extractelement <4 x float> %shuffle, i32 3			%vecext4 = extractelement <4 x float> %shuffle, i32 3
	%add5 = fadd float %vecext3, %vecext4			%add5 = fadd float %vecext3, %vecext4
	%vecinit6 = insertelement <4 x float> %vecinit, float %add5, i32 1			%vecinit6 = insertelement <4 x float> %vecinit, float %add5, i32 1
	%vecext7 = extractelement <4 x float> %shuffle1, i32 0			%vecext7 = extractelement <4 x float> %shuffle1, i32 0
	%vecext8 = extractelement <4 x float> %shuffle1, i32 1			%vecext8 = extractelement <4 x float> %shuffle1, i32 1
	%add9 = fadd float %vecext7, %vecext8			%add9 = fadd float %vecext7, %vecext8
	%vecinit10 = insertelement <4 x float> %vecinit6, float %add9, i32 2			%vecinit10 = insertelement <4 x float> %vecinit6, float %add9, i32 2
	%vecext11 = extractelement <4 x float> %shuffle1, i32 2			%vecext11 = extractelement <4 x float> %shuffle1, i32 2
	%vecext12 = extractelement <4 x float> %shuffle1, i32 3			%vecext12 = extractelement <4 x float> %shuffle1, i32 3
	%add13 = fadd float %vecext11, %vecext12			%add13 = fadd float %vecext11, %vecext12
	%vecinit14 = insertelement <4 x float> %vecinit10, float %add13, i32 3			%vecinit14 = insertelement <4 x float> %vecinit10, float %add13, i32 3
	ret <4 x float> %vecinit14			ret <4 x float> %vecinit14
	}			}

	define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) #0 {			define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) #0 {
	; CHECK-LABEL: @reverse_hadd_v4f32(			; CHECK-LABEL: @reverse_hadd_v4f32(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[TMP1]], [[A]]			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[TMP3]], [[A]]			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP4]], <4 x i32> <i32 undef, i32 undef, i32 6, i32 0>			; CHECK-NEXT: ret <4 x float> [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], [[B]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP7]], <4 x i32> <i32 undef, i32 4, i32 2, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[B]], <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>
	; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> [[TMP9]], [[B]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP10]], <4 x i32> <i32 6, i32 1, i32 2, i32 3>
	; CHECK-NEXT: ret <4 x float> [[TMP11]]
	;			;
	%vecext = extractelement <4 x float> %a, i32 0			%vecext = extractelement <4 x float> %a, i32 0
	%vecext1 = extractelement <4 x float> %a, i32 1			%vecext1 = extractelement <4 x float> %a, i32 1
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	%vecinit = insertelement <4 x float> undef, float %add, i32 0			%vecinit = insertelement <4 x float> undef, float %add, i32 0
	%vecext2 = extractelement <4 x float> %a, i32 2			%vecext2 = extractelement <4 x float> %a, i32 2
	%vecext3 = extractelement <4 x float> %a, i32 3			%vecext3 = extractelement <4 x float> %a, i32 3
	%add4 = fadd float %vecext2, %vecext3			%add4 = fadd float %vecext2, %vecext3
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -S -mattr=avx < %s \| FileCheck %s			; RUN: opt -O2 -S -mattr=avx < %s \| FileCheck %s
	; RUN: opt -passes='default<O2>' -S -mattr=avx < %s \| FileCheck %s			; RUN: opt -passes='default<O2>' -S -mattr=avx < %s \| FileCheck %s

	target triple = "x86_64--"			target triple = "x86_64--"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; FIXME: This should only need 2 'or' instructions.

	define i32 @ext_ext_or_reduction_v4i32(<4 x i32> %x, <4 x i32> %y) {			define i32 @ext_ext_or_reduction_v4i32(<4 x i32> %x, <4 x i32> %y) {
				RKSimonUnsubmitted Done Reply Inline Actions update comment? RKSimon: update comment?
				spatelAuthorUnsubmitted Done Reply Inline Actions Right - will update on push. spatel: Right - will update on push.
	; CHECK-LABEL: @ext_ext_or_reduction_v4i32(			; CHECK-LABEL: @ext_ext_or_reduction_v4i32(
	; CHECK-NEXT: [[Z:%.]] = and <4 x i32> [[Y:%.]], [[X:%.*]]			; CHECK-NEXT: [[Z:%.]] = and <4 x i32> [[Y:%.]], [[X:%.*]]
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[Z]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[Z]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2:%.*]] = or <4 x i32> [[Z]], [[TMP1]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i32> [[Z]], [[RDX_SHUF]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[Z]], <4 x i32> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = or <4 x i32> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[Z]], <4 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = or <4 x i32> [[TMP4]], [[TMP5]]			; CHECK-NEXT: ret i32 [[TMP1]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP6]], i32 0
	; CHECK-NEXT: ret i32 [[TMP7]]
	;			;
	%z = and <4 x i32> %x, %y			%z = and <4 x i32> %x, %y
	%z0 = extractelement <4 x i32> %z, i32 0			%z0 = extractelement <4 x i32> %z, i32 0
	%z1 = extractelement <4 x i32> %z, i32 1			%z1 = extractelement <4 x i32> %z, i32 1
	%z01 = or i32 %z0, %z1			%z01 = or i32 %z0, %z1
	%z2 = extractelement <4 x i32> %z, i32 2			%z2 = extractelement <4 x i32> %z, i32 2
	%z012 = or i32 %z01, %z2			%z012 = or i32 %z01, %z2
	%z3 = extractelement <4 x i32> %z, i32 3			%z3 = extractelement <4 x i32> %z, i32 3
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] position pass after SLP in the optimization pipeline rather than beforeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 265758

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Other/opt-pipeline-vector-passes.ll

llvm/test/Transforms/PhaseOrdering/X86/addsub.ll

llvm/test/Transforms/PhaseOrdering/X86/horiz-math.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

[VectorCombine] position pass after SLP in the optimization pipeline rather than before
ClosedPublic