This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Passes/
-
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
opt-O2-pipeline.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
Transforms/PhaseOrdering/X86/
-
PhaseOrdering/
-
X86/
-
addsub.ll

Differential D75145

[PassManager] adjust VectorCombine placement
ClosedPublic

Authored by spatel on Feb 25 2020, 2:59 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
efriedma
RKSimon
echristo
fhahn
xbolva00
hfinkel
fedor.sergeev

Commits

rG71a316883d50: [PassManager] adjust VectorCombine placement

Summary

The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs:
https://bugs.llvm.org/show_bug.cgi?id=45015
https://bugs.llvm.org/show_bug.cgi?id=42022

This patch proposes a few changes:

Move the pass up in the pipeline, so it happens just after loop-vectorization. This is only to keep vectorization passes together in the pipeline at the moment. I don't have any evidence of interaction between these yet.
Add an -early-cse pass after -vector-combine to clean up redundant ops. This was partly proposed as far back as rL219644 (which is why it's effectively being moved in the old PM code). This is important because the subsequent -instcombine doesn't work as well without this. With the CSE, -instcombine is able to squash shuffles together in 1 of the tests (because those are simple "select" shuffles).
Remove the -vector-combine pass that was running after SLP. We may want to do that eventually, but I don't have a test case to support it yet.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Feb 25 2020, 2:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2020, 2:59 PM

Herald added subscribers: dexonsmith, steven_wu, hiraditya, mcrosier. · View Herald Transcript

I didn't see those pass ordering issues coming, so here I think i'm going to defer to other reviewers :)
While there, can we add an "-enable-vector-combiner=true" option to the VectorCombiner?

spatel mentioned this in D75204: [VectorCombine] add a debug flag to skip all transforms.Feb 26 2020, 11:46 AM

spatel mentioned this in rG25c6544f32ee: [VectorCombine] add a debug flag to skip all transforms.Feb 26 2020, 12:18 PM

In D75145#1892407, @lebedev.ri wrote:

While there, can we add an "-enable-vector-combiner=true" option to the VectorCombiner?

Yes - that will be handy as we untangle interactions between this pass and others:
rG25c6544f32ee

Adding some more potential reviewers for a pipeline alteration.

One observation i can make - i think we have a test coverage (-debug-pass=Structure) issue for -fno-vectorize,
i'd think we should see in tests that the pass no longer runs in presence of -fno-vectorize, but i don't think we do?

LG with suggested testcase.

Patch updated:
After thinking this over (and stepping through the various existing vectorization enable/disable flags), I'm removing the loop-vectorizer predicate from this patch. The reasons are:

Although the flag is called is "-fno-vectorize" in clang, it only applies to loop-vectorization in LLVM, and the enable/disable logic is complicated. This is apparently necessary because we want vector pragmas on loops to override that flag.
-vector-combine is not something that we want to limit to -O2 (a silent side condition of the "LoopVectorize" predicate).
-vector-combine is not strictly about vectorizing code; the cleanup ability could extend to scalarizing in the future (ideally, we may offload some functionality that exists in or is proposed for InstCombine).

Part of the motivation for having a disable flag was addressed by adding a dedicated debug flag for -vector-combine with:
rG25c6544f32ee

So this patch is now simpler. If we do want to gate the cleanup passes in conjunction with other vectorization passes, we'll have better test coverage via:
rG99b86d76b5e1
(no diffs for now from this patch)

Would be good to hear from someone more involved with pipeline ordering, but test changes look good to me..

This revision is now accepted and ready to land.Feb 29 2020, 7:51 AM

The changes seem relatively save, but I am wondering if the vector combine pass makes the CSE problem more acute? Otherwise it might be better to add the extra EarlyCSE run separately (I'm not sure the name will be quite accurate after the change, it runs quite late now :))

In D75145#1899555, @fhahn wrote:

The changes seem relatively save, but I am wondering if the vector combine pass makes the CSE problem more acute? Otherwise it might be better to add the extra EarlyCSE run separately (I'm not sure the name will be quite accurate after the change, it runs quite late now :))

Yes, it's more just plain CSE. Several targets like AArch64, AMDGPU, and PowerPC already use that pass even later during target-specific IR codegen, so this isn't even pushing the edge. :)
And yes, there are 3 somewhat independent diffs here as mentioned in the description. The addition of CSE is the only 1 that I'm aware of that will result in a test diff. So I can commit the others separately to reduce risk, but I'm not sure how to extract a test diff for those changes.

In D75145#1899652, @spatel wrote:

In D75145#1899555, @fhahn wrote:

The changes seem relatively save, but I am wondering if the vector combine pass makes the CSE problem more acute? Otherwise it might be better to add the extra EarlyCSE run separately (I'm not sure the name will be quite accurate after the change, it runs quite late now :))

Yes, it's more just plain CSE. Several targets like AArch64, AMDGPU, and PowerPC already use that pass even later during target-specific IR codegen, so this isn't even pushing the edge. :)
And yes, there are 3 somewhat independent diffs here as mentioned in the description. The addition of CSE is the only 1 that I'm aware of that will result in a test diff. So I can commit the others separately to reduce risk, but I'm not sure how to extract a test diff for those changes.

Not sure if I answered that question well enough: does the vector combine pass makes the CSE problem more acute? Yes - as seen in the IR test diffs, we're potentially creating more CSE opportunities than existed before. In the bug reports, this interferes with later SLP transforms (and that pass may create CSE opportunities itself, but we're not addressing that with this patch).

Closed by commit rG71a316883d50: [PassManager] adjust VectorCombine placement (authored by spatel). · Explain WhyMar 4 2020, 8:26 AM

This revision was automatically updated to reflect the committed changes.

In D75145#1901470, @spatel wrote:

In D75145#1899652, @spatel wrote:

In D75145#1899555, @fhahn wrote:

The changes seem relatively save, but I am wondering if the vector combine pass makes the CSE problem more acute? Otherwise it might be better to add the extra EarlyCSE run separately (I'm not sure the name will be quite accurate after the change, it runs quite late now :))

Yes, it's more just plain CSE. Several targets like AArch64, AMDGPU, and PowerPC already use that pass even later during target-specific IR codegen, so this isn't even pushing the edge. :)
And yes, there are 3 somewhat independent diffs here as mentioned in the description. The addition of CSE is the only 1 that I'm aware of that will result in a test diff. So I can commit the others separately to reduce risk, but I'm not sure how to extract a test diff for those changes.

Not sure if I answered that question well enough: does the vector combine pass makes the CSE problem more acute? Yes - as seen in the IR test diffs, we're potentially creating more CSE opportunities than existed before. In the bug reports, this interferes with later SLP transforms (and that pass may create CSE opportunities itself, but we're not addressing that with this patch).

SGTM, thanks!

When you say "phase ordering bug" that is about not generating as optimized code as expected, right? Not ending up with miscompiles (or compiler crashes)?

(just curious since we got regressions downstream after this patch... haven't looked deeper at that and it could just be some limitations in our backend, but could be nice to know if it is "safe" for us to revert this downstream while investigating)

In D75145#1909324, @bjope wrote:

When you say "phase ordering bug" that is about not generating as optimized code as expected, right? Not ending up with miscompiles (or compiler crashes)?

That is my understanding, yes, no miscompiles.

(just curious since we got regressions downstream after this patch... haven't looked deeper at that and it could just be some limitations in our backend, but could be nice to know if it is "safe" for us to revert this downstream while investigating)

In D75145#1909324, @bjope wrote:

When you say "phase ordering bug" that is about not generating as optimized code as expected, right? Not ending up with miscompiles (or compiler crashes)?

(just curious since we got regressions downstream after this patch... haven't looked deeper at that and it could just be some limitations in our backend, but could be nice to know if it is "safe" for us to revert this downstream while investigating)

Yes, this patch is only about getting more optimized code through the opt pipeline.

Other than this problem:
D75327
...I don't know of VectorCombine causing miscompiles/crashing.

just curious since we got regressions downstream after this patch... haven't looked deeper at that

Same here. It looks like running cse before instcombine is altering a fair amount, at least in a way that our Low Overhead loop pass does not like. I'm not sure if there are other problems or if it's just that.

Looking at it, the way the iteration count is calculated is done differently now. This code:
https://godbolt.org/z/2gBwF2
Has changed the way that the vector preheader calculated the loop iteration values. This is after (top) and before (bottom):
https://godbolt.org/z/tocy_x
Notice the differences in %n.mod.vf = and i32 %blockSize, 7 vs %n.vec = and i32 %blockSize, -8. The SCEV of the BETC for the vector body is then unknown in the new case. I think that's what's causing the low overhead loop pass to go wrong, probably the unrolling too.

Any thoughts?

In D75145#1909503, @dmgreen wrote:

just curious since we got regressions downstream after this patch... haven't looked deeper at that

Same here. It looks like running cse before instcombine is altering a fair amount, at least in a way that our Low Overhead loop pass does not like. I'm not sure if there are other problems or if it's just that.

Looking at it, the way the iteration count is calculated is done differently now. This code:
https://godbolt.org/z/2gBwF2
Has changed the way that the vector preheader calculated the loop iteration values. This is after (top) and before (bottom):
https://godbolt.org/z/tocy_x
Notice the differences in %n.mod.vf = and i32 %blockSize, 7 vs %n.vec = and i32 %blockSize, -8. The SCEV of the BETC for the vector body is then unknown in the new case. I think that's what's causing the low overhead loop pass to go wrong, probably the unrolling too.

Any thoughts?

I don't know if that helps the problem overall, but i see yet another seemingly-bogus one-use check restriction there.
https://godbolt.org/z/z8oyUC
I'll see if i can post a patch..

In D75145#1909542, @lebedev.ri wrote:

In D75145#1909503, @dmgreen wrote:

just curious since we got regressions downstream after this patch... haven't looked deeper at that

Same here. It looks like running cse before instcombine is altering a fair amount, at least in a way that our Low Overhead loop pass does not like. I'm not sure if there are other problems or if it's just that.

Looking at it, the way the iteration count is calculated is done differently now. This code:
https://godbolt.org/z/2gBwF2
Has changed the way that the vector preheader calculated the loop iteration values. This is after (top) and before (bottom):
https://godbolt.org/z/tocy_x
Notice the differences in %n.mod.vf = and i32 %blockSize, 7 vs %n.vec = and i32 %blockSize, -8. The SCEV of the BETC for the vector body is then unknown in the new case. I think that's what's causing the low overhead loop pass to go wrong, probably the unrolling too.

Any thoughts?

I don't know if that helps the problem overall, but i see yet another seemingly-bogus one-use check restriction there.
https://godbolt.org/z/z8oyUC
I'll see if i can post a patch..

Hm, no, doesn't help much https://godbolt.org/z/G24anE
Though from SCEV side i'd say that was overall helpful, less <<Unknown>>s.

In D75145#1909597, @lebedev.ri wrote:

In D75145#1909542, @lebedev.ri wrote:

In D75145#1909503, @dmgreen wrote:

just curious since we got regressions downstream after this patch... haven't looked deeper at that

Same here. It looks like running cse before instcombine is altering a fair amount, at least in a way that our Low Overhead loop pass does not like. I'm not sure if there are other problems or if it's just that.

Looking at it, the way the iteration count is calculated is done differently now. This code:
https://godbolt.org/z/2gBwF2
Has changed the way that the vector preheader calculated the loop iteration values. This is after (top) and before (bottom):
https://godbolt.org/z/tocy_x
Notice the differences in %n.mod.vf = and i32 %blockSize, 7 vs %n.vec = and i32 %blockSize, -8. The SCEV of the BETC for the vector body is then unknown in the new case. I think that's what's causing the low overhead loop pass to go wrong, probably the unrolling too.

Any thoughts?

I don't know if that helps the problem overall, but i see yet another seemingly-bogus one-use check restriction there.
https://godbolt.org/z/z8oyUC
I'll see if i can post a patch..

Hm, no, doesn't help much https://godbolt.org/z/G24anE
Though from SCEV side i'd say that was overall helpful, less <<Unknown>>s.

Just to make sure I'm seeing it correctly:

The problem(s) we're discussing are independent of VectorCombine.
The extra run of EarlyCSE is making InstCombine more effective (that was the intent of this patch).
The differences in IR after InstCombine are causing problems for passes later in the pipeline.

That matches my understanding.

Not sure about #2. Define "more effective". Creating unanalyzable loops isn't very effective ;)

I was working under the assumption that the old form was more canonical, and CSE had messed that up somehow. I might well have had that backwards though, and you might be right. Perhaps the new form would be better, if only SCEV could understand it?

I'm not sure if it's possible to fix SCEV in these cases? Any ideas? Not being able to calculate BackEdgeTakenCount's for vector loop bodies sounds like a big problem. For us ends up disabling low overhead/hardware loops, so tail predication would also be effected (if it was enabled). Loop unrolling would also be effected if it was desirable for vectorized loops (in our case it is only by accident for a few edge cases).

In D75145#1909694, @spatel wrote:

In D75145#1909597, @lebedev.ri wrote:

In D75145#1909542, @lebedev.ri wrote:

In D75145#1909503, @dmgreen wrote:

just curious since we got regressions downstream after this patch... haven't looked deeper at that

Same here. It looks like running cse before instcombine is altering a fair amount, at least in a way that our Low Overhead loop pass does not like. I'm not sure if there are other problems or if it's just that.

Looking at it, the way the iteration count is calculated is done differently now. This code:
https://godbolt.org/z/2gBwF2
Has changed the way that the vector preheader calculated the loop iteration values. This is after (top) and before (bottom):
https://godbolt.org/z/tocy_x

Oh wait, the bottom source is the good one? I misread that completely.

Notice the differences in %n.mod.vf = and i32 %blockSize, 7 vs %n.vec = and i32 %blockSize, -8. The SCEV of the BETC for the vector body is then unknown in the new case. I think that's what's causing the low overhead loop pass to go wrong, probably the unrolling too.

Any thoughts?

I don't know if that helps the problem overall, but i see yet another seemingly-bogus one-use check restriction there.
https://godbolt.org/z/z8oyUC
I'll see if i can post a patch..

Hm, no, doesn't help much https://godbolt.org/z/G24anE
Though from SCEV side i'd say that was overall helpful, less <<Unknown>>s.

Just to make sure I'm seeing it correctly:

The problem(s) we're discussing are independent of VectorCombine.

The extra run of EarlyCSE is making InstCombine more effective (that was the intent of this patch).

The differences in IR after InstCombine are causing problems for passes later in the pipeline.

.. but while i have completely misreading the bugreport, i still apparently correctly identified the problem, and provided a fix (D75757).

lebedev.ri mentioned this in D75757: [InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant.Mar 6 2020, 9:38 AM

@dmgreen i expect that regression to have been fixed by rG1badf7c33a5d01900c77646f750e2ea11ad8bf5a, please confirm.
Please do post more findings, if there are any.

In D75145#1909968, @lebedev.ri wrote:

.. but while i have completely misreading the bugreport, i still apparently correctly identified the problem, and provided a fix (D75757).

I'm still trying to step through how this affects SCEV (haven't looked at SCEV much before). Are you seeing that D75757 solves the ARM codegen regression, or do we still need to fix something in SCEV and/or the later passes?
@dmgreen - can you describe what we're seeing in the final output? Ie, what asm diff should we focus on, and was the code before this patch ideal?

In D75145#1910117, @lebedev.ri wrote:

@dmgreen i expect that regression to have been fixed by rG1badf7c33a5d01900c77646f750e2ea11ad8bf5a, please confirm.
Please do post more findings, if there are any.

Oops...you're moving faster than me or my email inbox, so disregard my earlier question. :)
I'd still like to understand if we have the ideal ARM output now. And maybe we can add a test to PhaseOrdering based on that ARM example, so we have coverage for the larger case.
Also, we may still have other problems as noted by @bjope .

lebedev.ri mentioned this in rG1badf7c33a5d: [InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y….Mar 6 2020, 11:03 AM

In D75145#1910129, @spatel wrote:

In D75145#1910117, @lebedev.ri wrote:

@dmgreen i expect that regression to have been fixed by rG1badf7c33a5d01900c77646f750e2ea11ad8bf5a, please confirm.
Please do post more findings, if there are any.

Oops...you're moving faster than me or my email inbox, so disregard my earlier question. :)

I'd still like to understand if we have the ideal ARM output now.

Look/compare -analyze -scalar-evolution output:
https://godbolt.org/z/XvgSua

first source is before *this patch*
second is after this patch, we clearly regress
and the last is after my fix, we're back to the original

And maybe we can add a test to PhaseOrdering based on that ARM example, so we have coverage for the larger case.

Yes, feel free to.

Also, we may still have other problems as noted by @bjope .

Yep, waiting for more examples.

My benchmarks were still running. D75757 wasn't in review long enough for them to complete before it went in (they seem to be being a bit slow, and phab seems to be sending emails through in chunks).

It looks like it's made things (a lot) worse, not better. For "normal" code this time, not vectorised. The issue here with vector loops might be improved. It's hard to tell. There are so many other regressions I can't really give you a quick answer. I mean, there are some improvements mixed in, but the total is definitely down. Not sure if this is an ARM issue again, or something more general. It doesn't effect (-Oz) codesize at all, or 6m, which might suggest that it's not just as simple as it disabling some analyses. I will see what I can find out, but we are going in the wrong direction here.

Adding some phase ordering tests for some of this sounds very useful. I'll see what I can add. With unrolling and vectorisation and the rest, they might get quite verbose. I'll see.

And you asked a question; The part of the assembly that was important for performance, from this first case was this vector body:

vldrh.u16       q0, [r0], #16
subs.w  r12, r12, #8
vqabs.s16       q0, q0
vstrb.8 q0, [r1], #16
bne     .LBB0_4

Which could be using a LE low overhead loop instruction:

vldrh.u16       q0, [r0], #16
vqabs.s16       q0, q0
vstrb.8 q0, [r1], #16
le     lr, .LBB0_4

There is a pass in the IR part of the backend that looks for loops, finds the BETC and adds hardware loop intrinsics for it. It's essentially a hardware loop so you don't need to execute the subs or the bne on each iteration.

The benchmark with a significant degradation I had noticed for out downstream target probably also suffered from missing BETC (messing up hwloop / software pipelining so quite a huge penalty).

I appliead the patch from https://reviews.llvm.org/D75757 and then things looked good again. So that patch at least solved my problem (thanks!).

spatel mentioned this in D80236: [VectorCombine] position pass after SLP in the optimization pipeline rather than before.May 23 2020, 5:41 AM

spatel mentioned this in rG57bb4787d72f: [Pass Manager] remove EarlyCSE as clean-up for VectorCombine.May 24 2020, 9:37 AM

spatel mentioned this in rG098e48a6a155: [PassManager] restore early-cse to vector cleanup.Jun 14 2020, 7:29 AM

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilder.cpp

9 lines

Transforms/

IPO/

PassManagerBuilder.cpp

5 lines

test/

Other/

new-pm-defaults.ll

6 lines

new-pm-thinlto-defaults.ll

6 lines

new-pm-thinlto-postlink-pgo-defaults.ll

6 lines

new-pm-thinlto-postlink-samplepgo-defaults.ll

6 lines

opt-O2-pipeline.ll

5 lines

opt-O3-pipeline.ll

5 lines

opt-Os-pipeline.ll

5 lines

Transforms/

PhaseOrdering/

X86/

addsub.ll

34 lines

Diff 248184

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 960 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildModuleOptimizationPipeline(
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
OptimizePM.addPass(LoopDistributePass());		OptimizePM.addPass(LoopDistributePass());

// Now run the core loop vectorizer.		// Now run the core loop vectorizer.
OptimizePM.addPass(LoopVectorizePass(		OptimizePM.addPass(LoopVectorizePass(
LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));		LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

		// Enhance/cleanup vector code.
		OptimizePM.addPass(VectorCombinePass());
		OptimizePM.addPass(EarlyCSEPass());

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
OptimizePM.addPass(LoopLoadEliminationPass());		OptimizePM.addPass(LoopLoadEliminationPass());

// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
OptimizePM.addPass(VectorCombinePass());
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

// Now that we've formed fast to execute loop structures, we do further		// Now that we've formed fast to execute loop structures, we do further
// optimizations. These are run afterward as they might block doing complex		// optimizations. These are run afterward as they might block doing complex
// analyses and transforms such as what are needed for loop vectorization.		// analyses and transforms such as what are needed for loop vectorization.

// Cleanup after loop vectorization, etc. Simplification passes like CVP and		// Cleanup after loop vectorization, etc. Simplification passes like CVP and
// GVN, loop transforms, and others have already run, so it's now better to		// GVN, loop transforms, and others have already run, so it's now better to
// convert to more optimized IR using more aggressive simplify CFG options.		// convert to more optimized IR using more aggressive simplify CFG options.
// The extra sinking transform can create larger basic blocks, so do this		// The extra sinking transform can create larger basic blocks, so do this
// before SLP vectorization.		// before SLP vectorization.
OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().		OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().
forwardSwitchCondToPhi(true).		forwardSwitchCondToPhi(true).
convertSwitchToLookupTable(true).		convertSwitchToLookupTable(true).
needCanonicalLoops(false).		needCanonicalLoops(false).
sinkCommonInsts(true)));		sinkCommonInsts(true)));

// Optimize parallel scalar instruction chains into SIMD instructions.		// Optimize parallel scalar instruction chains into SIMD instructions.
if (PTO.SLPVectorization) {		if (PTO.SLPVectorization)
OptimizePM.addPass(SLPVectorizerPass());		OptimizePM.addPass(SLPVectorizerPass());
OptimizePM.addPass(VectorCombinePass());
}

OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

// Unroll small loops to hide loop backedge latency and saturate any parallel		// Unroll small loops to hide loop backedge latency and saturate any parallel
// execution resources of an out-of-order processor. We also then need to		// execution resources of an out-of-order processor. We also then need to
// clean up redundancies and loop invariant code.		// clean up redundancies and loop invariant code.
// FIXME: It would be really good to use a loop-integrated instruction		// FIXME: It would be really good to use a loop-integrated instruction
// combiner for cleanup here so that the unrolling and LICM can be pipelined		// combiner for cleanup here so that the unrolling and LICM can be pipelined
▲ Show 20 Lines • Show All 1,487 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
MPM.add(createLoopDistributePass());		MPM.add(createLoopDistributePass());

MPM.add(createLoopVectorizePass(!LoopsInterleaved, !LoopVectorize));		MPM.add(createLoopVectorizePass(!LoopsInterleaved, !LoopVectorize));
		MPM.add(createVectorCombinePass());
		MPM.add(createEarlyCSEPass());

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
MPM.add(createLoopLoadEliminationPass());		MPM.add(createLoopLoadEliminationPass());

// FIXME: Because of #pragma vectorize enable, the passes below are always		// FIXME: Because of #pragma vectorize enable, the passes below are always
// inserted in the pipeline, even when the vectorizer doesn't run (ex. when		// inserted in the pipeline, even when the vectorizer doesn't run (ex. when
// on -O1 and no #pragma is found). Would be good to have these two passes		// on -O1 and no #pragma is found). Would be good to have these two passes
// as function calls, so that we can only pass them when the vectorizer		// as function calls, so that we can only pass them when the vectorizer
// changed the code.		// changed the code.
MPM.add(createVectorCombinePass());
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
if (OptLevel > 1 && ExtraVectorizerPasses) {		if (OptLevel > 1 && ExtraVectorizerPasses) {
// At higher optimization levels, try to clean up any runtime overlap and		// At higher optimization levels, try to clean up any runtime overlap and
// alignment checks inserted by the vectorizer. We want to track correllated		// alignment checks inserted by the vectorizer. We want to track correllated
// runtime checks for two inner loops in the same outer loop, fold any		// runtime checks for two inner loops in the same outer loop, fold any
// common computations, hoist loop-invariant aspects out of any outer loop,		// common computations, hoist loop-invariant aspects out of any outer loop,
// and unswitch the runtime checks if possible. Once hoisted, we may have		// and unswitch the runtime checks if possible. Once hoisted, we may have
// dead (or speculatable) control flows or more combining opportunities.		// dead (or speculatable) control flows or more combining opportunities.
MPM.add(createEarlyCSEPass());
MPM.add(createCorrelatedValuePropagationPass());		MPM.add(createCorrelatedValuePropagationPass());
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));
MPM.add(createCFGSimplificationPass());		MPM.add(createCFGSimplificationPass());
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
}		}

// Cleanup after loop vectorization, etc. Simplification passes like CVP and		// Cleanup after loop vectorization, etc. Simplification passes like CVP and
// GVN, loop transforms, and others have already run, so it's now better to		// GVN, loop transforms, and others have already run, so it's now better to
// convert to more optimized IR using more aggressive simplify CFG options.		// convert to more optimized IR using more aggressive simplify CFG options.
// The extra sinking transform can create larger basic blocks, so do this		// The extra sinking transform can create larger basic blocks, so do this
// before SLP vectorization.		// before SLP vectorization.
MPM.add(createCFGSimplificationPass(1, true, true, false, true));		MPM.add(createCFGSimplificationPass(1, true, true, false, true));

if (SLPVectorize) {		if (SLPVectorize) {
MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
MPM.add(createVectorCombinePass());
if (OptLevel > 1 && ExtraVectorizerPasses) {		if (OptLevel > 1 && ExtraVectorizerPasses) {
MPM.add(createEarlyCSEPass());		MPM.add(createEarlyCSEPass());
}		}
}		}

addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);

▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
				; CHECK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O2-NEXT: Running pass: VectorCombinePass
	; CHECK-O3-NEXT: Running pass: VectorCombinePass
	; CHECK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
				; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-POSTLINK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O3-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
				; CHECK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O2-NEXT: Running pass: VectorCombinePass
	; CHECK-O3-NEXT: Running pass: VectorCombinePass
	; CHECK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
				; CHECK-O-NEXT: Running pass: VectorCombinePass
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O2-NEXT: Running pass: VectorCombinePass
	; CHECK-O3-NEXT: Running pass: VectorCombinePass
	; CHECK-Os-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: Loop Vectorization			; CHECK-NEXT: Loop Vectorization
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: Loop Vectorization			; CHECK-NEXT: Loop Vectorization
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Inject TLI Mappings			; CHECK-NEXT: Inject TLI Mappings
	; CHECK-NEXT: Loop Vectorization			; CHECK-NEXT: Loop Vectorization
				; CHECK-NEXT: Optimize scalar/vector ops
				; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Load Elimination			; CHECK-NEXT: Loop Load Elimination
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: SLP Vectorizer			; CHECK-NEXT: SLP Vectorizer
	; CHECK-NEXT: Optimize scalar/vector ops
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/addsub.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -O3 -S -mtriple=x86_64-- -mattr=avx \| FileCheck %s			; RUN: opt < %s -O3 -S -mtriple=x86_64-- -mattr=avx \| FileCheck %s
	; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s			; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; TODO: Ideally, this should reach the backend with 1 fsub, 1 fadd, and 1 shuffle.			; Ideally, this should reach the backend with 1 fsub, 1 fadd, and 1 shuffle.
	; That may require some coordination between VectorCombine, SLP, and other passes.			; That may require some coordination between VectorCombine, SLP, and other passes.
	; The end goal is to get a single "vaddsubps" instruction for x86 with AVX.			; The end goal is to get a single "vaddsubps" instruction for x86 with AVX.

	define <4 x float> @PR45015(<4 x float> %arg, <4 x float> %arg1) {			define <4 x float> @PR45015(<4 x float> %arg, <4 x float> %arg1) {
	; CHECK-LABEL: @PR45015(			; CHECK-LABEL: @PR45015(
	; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x float> [[ARG:%.]], [[ARG1:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x float> [[ARG:%.]], [[ARG1:%.*]]
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[ARG]], [[ARG1]]			; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[ARG]], [[ARG1]]
	; CHECK-NEXT: [[T8:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>			; CHECK-NEXT: [[T16:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[ARG]], [[ARG1]]
	; CHECK-NEXT: [[T12:%.*]] = shufflevector <4 x float> [[T8]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 6, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[ARG]], [[ARG1]]
	; CHECK-NEXT: [[T16:%.*]] = shufflevector <4 x float> [[T12]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
	; CHECK-NEXT: ret <4 x float> [[T16]]			; CHECK-NEXT: ret <4 x float> [[T16]]
	;			;
	%t = extractelement <4 x float> %arg, i32 0			%t = extractelement <4 x float> %arg, i32 0
	%t2 = extractelement <4 x float> %arg1, i32 0			%t2 = extractelement <4 x float> %arg1, i32 0
	%t3 = fsub float %t, %t2			%t3 = fsub float %t, %t2
	%t4 = insertelement <4 x float> undef, float %t3, i32 0			%t4 = insertelement <4 x float> undef, float %t3, i32 0
	%t5 = extractelement <4 x float> %arg, i32 1			%t5 = extractelement <4 x float> %arg, i32 1
	%t6 = extractelement <4 x float> %arg1, i32 1			%t6 = extractelement <4 x float> %arg1, i32 1
	Show All 12 Lines

	; PR42022 - https://bugs.llvm.org/show_bug.cgi?id=42022			; PR42022 - https://bugs.llvm.org/show_bug.cgi?id=42022

	%struct.Vector4 = type { float, float, float, float }			%struct.Vector4 = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @add_aggregate(<2 x float> %a0, <2 x float> %a1, <2 x float> %b0, <2 x float> %b1) {			define { <2 x float>, <2 x float> } @add_aggregate(<2 x float> %a0, <2 x float> %a1, <2 x float> %b0, <2 x float> %b1) {
	; CHECK-LABEL: @add_aggregate(			; CHECK-LABEL: @add_aggregate(
	; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x float> [[A0]], [[B0]]			; CHECK-NEXT: [[TMP2:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]
	; CHECK-NEXT: [[RETVAL_0_1_INSERT:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> [[TMP2]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[FCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP1]], 0
	; CHECK-NEXT: [[TMP3:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]			; CHECK-NEXT: [[FCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[FCA_0_INSERT]], <2 x float> [[TMP2]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x float> [[A1]], [[B1]]
	; CHECK-NEXT: [[RETVAL_1_1_INSERT:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[FCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[RETVAL_0_1_INSERT]], 0
	; CHECK-NEXT: [[FCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[FCA_0_INSERT]], <2 x float> [[RETVAL_1_1_INSERT]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[FCA_1_INSERT]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[FCA_1_INSERT]]
	;			;
	%a00 = extractelement <2 x float> %a0, i32 0			%a00 = extractelement <2 x float> %a0, i32 0
	%b00 = extractelement <2 x float> %b0, i32 0			%b00 = extractelement <2 x float> %b0, i32 0
	%add = fadd float %a00, %b00			%add = fadd float %a00, %b00
	%retval.0.0.insert = insertelement <2 x float> undef, float %add, i32 0			%retval.0.0.insert = insertelement <2 x float> undef, float %add, i32 0
	%a01 = extractelement <2 x float> %a0, i32 1			%a01 = extractelement <2 x float> %a0, i32 1
	%b01 = extractelement <2 x float> %b0, i32 1			%b01 = extractelement <2 x float> %b0, i32 1
	Show All 13 Lines
	}			}

	define void @add_aggregate_store(<2 x float> %a0, <2 x float> %a1, <2 x float> %b0, <2 x float> %b1, %struct.Vector4* nocapture dereferenceable(16) %r) {			define void @add_aggregate_store(<2 x float> %a0, <2 x float> %a1, <2 x float> %b0, <2 x float> %b1, %struct.Vector4* nocapture dereferenceable(16) %r) {
	; CHECK-LABEL: @add_aggregate_store(			; CHECK-LABEL: @add_aggregate_store(
	; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x float> [[A0:%.]], [[B0:%.*]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; CHECK-NEXT: [[R0:%.]] = getelementptr inbounds [[STRUCT_VECTOR4:%.]], %struct.Vector4* [[R:%.*]], i64 0, i32 0			; CHECK-NEXT: [[R0:%.]] = getelementptr inbounds [[STRUCT_VECTOR4:%.]], %struct.Vector4* [[R:%.*]], i64 0, i32 0
	; CHECK-NEXT: store float [[TMP2]], float* [[R0]], align 4			; CHECK-NEXT: store float [[TMP2]], float* [[R0]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[A0]], [[B0]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[R1:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 1			; CHECK-NEXT: [[R1:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 1
	; CHECK-NEXT: store float [[TMP4]], float* [[R1]], align 4			; CHECK-NEXT: store float [[TMP3]], float* [[R1]], align 4
	; CHECK-NEXT: [[TMP5:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]			; CHECK-NEXT: [[TMP4:%.]] = fadd <2 x float> [[A1:%.]], [[B1:%.*]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; CHECK-NEXT: [[R2:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 2			; CHECK-NEXT: [[R2:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 2
	; CHECK-NEXT: store float [[TMP6]], float* [[R2]], align 4			; CHECK-NEXT: store float [[TMP5]], float* [[R2]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x float> [[A1]], [[B1]]			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
	; CHECK-NEXT: [[R3:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 3			; CHECK-NEXT: [[R3:%.]] = getelementptr inbounds [[STRUCT_VECTOR4]], %struct.Vector4 [[R]], i64 0, i32 3
	; CHECK-NEXT: store float [[TMP8]], float* [[R3]], align 4			; CHECK-NEXT: store float [[TMP6]], float* [[R3]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%a00 = extractelement <2 x float> %a0, i32 0			%a00 = extractelement <2 x float> %a0, i32 0
	%b00 = extractelement <2 x float> %b0, i32 0			%b00 = extractelement <2 x float> %b0, i32 0
	%add = fadd float %a00, %b00			%add = fadd float %a00, %b00
	%r0 = getelementptr inbounds %struct.Vector4, %struct.Vector4* %r, i64 0, i32 0			%r0 = getelementptr inbounds %struct.Vector4, %struct.Vector4* %r, i64 0, i32 0
	store float %add, float* %r0, align 4			store float %add, float* %r0, align 4
	%a01 = extractelement <2 x float> %a0, i32 1			%a01 = extractelement <2 x float> %a0, i32 1
	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PassManager] adjust VectorCombine placementClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248184

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Transforms/PhaseOrdering/X86/addsub.ll

[PassManager] adjust VectorCombine placement
ClosedPublic