This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Passes/
5/6
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
opt-pipeline.ll
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
new-pm-thinlto-prelink-pgo-defaults.ll
-
new-pm-thinlto-prelink-samplepgo-defaults.ll
-
opt-O2-pipeline.ll
-
opt-O3-pipeline-enable-matrix.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
4/4
pass-pipelines.ll
-
Transforms/
-
IndVarSimplify/X86/
-
X86/
-
pr45360.ll
-
PhaseOrdering/
-
X86/
-
spurious-peeling.ll
-
vdiv.ll
-
loop-rotation-vs-common-code-hoisting.ll

Differential D99249

[PassManager] Run additional LICM before LoopRotate
ClosedPublic

Authored by lebedev.ri on Mar 24 2021, 2:39 AM.

Download Raw Diff

Details

Reviewers

thopre
fhahn
jeroen.dobbelaere
nikic
MaskRay
mkazantsev
reames
dmgreen
jdoerfert
asbirlea

Commits

rGa26f1bf67ec7: [PassManager] Run additional LICM before LoopRotate

Summary

This is an alternative to D99204.
Better PhaseOrdering test TBD.

Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

In D99204, @thopre wrote:

With loop peeling, it is important that unnecessary PHIs be avoided or
it will leads to spurious peeling. One source of such PHIs is loop
rotation which creates PHIs for invariant loads. Those PHIs are
particularly problematic since loop peeling is now run as part of simple
loop unrolling before GVN is run, and are thus a source of spurious
peeling.

Note that while some of the load can be hoisted and eventually
eliminated by instruction combine, this is not always possible due to
alignment issue. In particular, the motivating example [1] was a load
inside a class instance which cannot be hoisted because the `this'
pointer has an alignment of 1.

[1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:

statistic name	LoopRotate-LICM	LICM-LoopRotate	Δ	%	abs(%)
asm-printer.EmittedInsts	9015930	9015799	-131	0.00%	0.00%
indvars.NumElimCmp	3536	3544	8	0.23%	0.23%
indvars.NumElimExt	36725	36580	-145	-0.39%	0.39%
indvars.NumElimIV	1197	1187	-10	-0.84%	0.84%
indvars.NumElimIdentity	143	136	-7	-4.90%	4.90%
indvars.NumElimRem	4	5	1	25.00%	25.00%
indvars.NumLFTR	29842	29890	48	0.16%	0.16%
indvars.NumReplaced	2293	2227	-66	-2.88%	2.88%
indvars.NumSimplifiedSDiv	6	8	2	33.33%	33.33%
indvars.NumWidened	26438	26329	-109	-0.41%	0.41%
instcount.TotalBlocks	1178338	1173840	-4498	-0.38%	0.38%
instcount.TotalFuncs	111825	111829	4	0.00%	0.00%
instcount.TotalInsts	9905442	9896139	-9303	-0.09%	0.09%
lcssa.NumLCSSA	425871	423961	-1910	-0.45%	0.45%
licm.NumHoisted	378357	378753	396	0.10%	0.10%
licm.NumMovedCalls	2193	2208	15	0.68%	0.68%
licm.NumMovedLoads	35899	31821	-4078	-11.36%	11.36%
licm.NumPromoted	11178	11154	-24	-0.21%	0.21%
licm.NumSunk	13359	13587	228	1.71%	1.71%
loop-delete.NumDeleted	8547	8402	-145	-1.70%	1.70%
loop-instsimplify.NumSimplified	12876	11890	-986	-7.66%	7.66%
loop-peel.NumPeeled	1008	925	-83	-8.23%	8.23%
loop-rotate.NumNotRotatedDueToHeaderSize	368	365	-3	-0.82%	0.82%
loop-rotate.NumRotated	42015	42003	-12	-0.03%	0.03%
loop-simplifycfg.NumLoopBlocksDeleted	240	242	2	0.83%	0.83%
loop-simplifycfg.NumLoopExitsDeleted	497	20	-477	-95.98%	95.98%
loop-simplifycfg.NumTerminatorsFolded	618	336	-282	-45.63%	45.63%
loop-unroll.NumCompletelyUnrolled	11028	11032	4	0.04%	0.04%
loop-unroll.NumUnrolled	12608	12529	-79	-0.63%	0.63%
mem2reg.NumDeadAlloca	10222	10221	-1	-0.01%	0.01%
mem2reg.NumPHIInsert	192110	192106	-4	0.00%	0.00%
mem2reg.NumSingleStore	637650	637643	-7	0.00%	0.00%
scalar-evolution.NumBruteForceTripCountsComputed	814	812	-2	-0.25%	0.25%
scalar-evolution.NumTripCountsComputed	283108	282934	-174	-0.06%	0.06%
scalar-evolution.NumTripCountsNotComputed	106712	106718	6	0.01%	0.01%
simple-loop-unswitch.NumBranches	5178	4752	-426	-8.23%	8.23%
simple-loop-unswitch.NumCostMultiplierSkipped	914	503	-411	-44.97%	44.97%
simple-loop-unswitch.NumSwitches	20	18	-2	-10.00%	10.00%
simple-loop-unswitch.NumTrivial	183	95	-88	-48.09%	48.09%

... but that actually regresses LICM (-12% licm.NumMovedLoads),
loop-simplifycfg (NumLoopExitsDeleted, NumTerminatorsFolded),
simple-loop-unswitch (NumTrivial).

What if we instead have LICM both before and after LoopRotate?

statistic name	LoopRotate-LICM	LICM-LoopRotate-LICM	Δ	%	abs(%)
asm-printer.EmittedInsts	9015930	9014474	-1456	-0.02%	0.02%
indvars.NumElimCmp	3536	3546	10	0.28%	0.28%
indvars.NumElimExt	36725	36681	-44	-0.12%	0.12%
indvars.NumElimIV	1197	1185	-12	-1.00%	1.00%
indvars.NumElimIdentity	143	146	3	2.10%	2.10%
indvars.NumElimRem	4	5	1	25.00%	25.00%
indvars.NumLFTR	29842	29899	57	0.19%	0.19%
indvars.NumReplaced	2293	2299	6	0.26%	0.26%
indvars.NumSimplifiedSDiv	6	8	2	33.33%	33.33%
indvars.NumWidened	26438	26404	-34	-0.13%	0.13%
instcount.TotalBlocks	1178338	1173652	-4686	-0.40%	0.40%
instcount.TotalFuncs	111825	111829	4	0.00%	0.00%
instcount.TotalInsts	9905442	9895452	-9990	-0.10%	0.10%
lcssa.NumLCSSA	425871	425373	-498	-0.12%	0.12%
licm.NumHoisted	378357	383352	4995	1.32%	1.32%
licm.NumMovedCalls	2193	2204	11	0.50%	0.50%
licm.NumMovedLoads	35899	35755	-144	-0.40%	0.40%
licm.NumPromoted	11178	11163	-15	-0.13%	0.13%
licm.NumSunk	13359	14321	962	7.20%	7.20%
loop-delete.NumDeleted	8547	8538	-9	-0.11%	0.11%
loop-instsimplify.NumSimplified	12876	12041	-835	-6.48%	6.48%
loop-peel.NumPeeled	1008	924	-84	-8.33%	8.33%
loop-rotate.NumNotRotatedDueToHeaderSize	368	365	-3	-0.82%	0.82%
loop-rotate.NumRotated	42015	42005	-10	-0.02%	0.02%
loop-simplifycfg.NumLoopBlocksDeleted	240	241	1	0.42%	0.42%
loop-simplifycfg.NumTerminatorsFolded	618	619	1	0.16%	0.16%
loop-unroll.NumCompletelyUnrolled	11028	11029	1	0.01%	0.01%
loop-unroll.NumUnrolled	12608	12525	-83	-0.66%	0.66%
mem2reg.NumPHIInsert	192110	192073	-37	-0.02%	0.02%
mem2reg.NumSingleStore	637650	637652	2	0.00%	0.00%
scalar-evolution.NumTripCountsComputed	283108	282998	-110	-0.04%	0.04%
scalar-evolution.NumTripCountsNotComputed	106712	106691	-21	-0.02%	0.02%
simple-loop-unswitch.NumBranches	5178	5185	7	0.14%	0.14%
simple-loop-unswitch.NumCostMultiplierSkipped	914	925	11	1.20%	1.20%
simple-loop-unswitch.NumTrivial	183	179	-4	-2.19%	2.19%
simple-loop-unswitch.NumBranches	5178	4752	-426	-8.23%	8.23%
simple-loop-unswitch.NumCostMultiplierSkipped	914	503	-411	-44.97%	44.97%
simple-loop-unswitch.NumSwitches	20	18	-2	-10.00%	10.00%
simple-loop-unswitch.NumTrivial	183	95	-88	-48.09%	48.09%

I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:

statistic name	LICM-LoopRotate	LICM-LoopRotate-LICM	Δ	%	abs(%)
asm-printer.EmittedInsts	9015799	9014474	-1325	-0.01%	0.01%
indvars.NumElimCmp	3544	3546	2	0.06%	0.06%
indvars.NumElimExt	36580	36681	101	0.28%	0.28%
indvars.NumElimIV	1187	1185	-2	-0.17%	0.17%
indvars.NumElimIdentity	136	146	10	7.35%	7.35%
indvars.NumLFTR	29890	29899	9	0.03%	0.03%
indvars.NumReplaced	2227	2299	72	3.23%	3.23%
indvars.NumWidened	26329	26404	75	0.28%	0.28%
instcount.TotalBlocks	1173840	1173652	-188	-0.02%	0.02%
instcount.TotalInsts	9896139	9895452	-687	-0.01%	0.01%
lcssa.NumLCSSA	423961	425373	1412	0.33%	0.33%
licm.NumHoisted	378753	383352	4599	1.21%	1.21%
licm.NumMovedCalls	2208	2204	-4	-0.18%	0.18%
licm.NumMovedLoads	31821	35755	3934	12.36%	12.36%
licm.NumPromoted	11154	11163	9	0.08%	0.08%
licm.NumSunk	13587	14321	734	5.40%	5.40%
loop-delete.NumDeleted	8402	8538	136	1.62%	1.62%
loop-instsimplify.NumSimplified	11890	12041	151	1.27%	1.27%
loop-peel.NumPeeled	925	924	-1	-0.11%	0.11%
loop-rotate.NumRotated	42003	42005	2	0.00%	0.00%
loop-simplifycfg.NumLoopBlocksDeleted	242	241	-1	-0.41%	0.41%
loop-simplifycfg.NumLoopExitsDeleted	20	497	477	2385.00%	2385.00%
loop-simplifycfg.NumTerminatorsFolded	336	619	283	84.23%	84.23%
loop-unroll.NumCompletelyUnrolled	11032	11029	-3	-0.03%	0.03%
loop-unroll.NumUnrolled	12529	12525	-4	-0.03%	0.03%
mem2reg.NumDeadAlloca	10221	10222	1	0.01%	0.01%
mem2reg.NumPHIInsert	192106	192073	-33	-0.02%	0.02%
mem2reg.NumSingleStore	637643	637652	9	0.00%	0.00%
scalar-evolution.NumBruteForceTripCountsComputed	812	814	2	0.25%	0.25%
scalar-evolution.NumTripCountsComputed	282934	282998	64	0.02%	0.02%
scalar-evolution.NumTripCountsNotComputed	106718	106691	-27	-0.03%	0.03%
simple-loop-unswitch.NumBranches	4752	5185	433	9.11%	9.11%
simple-loop-unswitch.NumCostMultiplierSkipped	503	925	422	83.90%	83.90%
simple-loop-unswitch.NumSwitches	18	20	2	11.11%	11.11%
simple-loop-unswitch.NumTrivial	95	179	84	88.42%	88.42%

0-looprotate-licm.stats20 KBDownload

1-licm-looprotate.stats20 KBDownload

2-licm-looprotate-licm.stats20 KBDownload

(this is vanilla llvm testsuite + rawspeed + darktable)

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Mar 24 2021, 2:39 AM

Herald added subscribers: wenlei, kerbowa, steven_wu and 3 others. · View Herald TranscriptMar 24 2021, 2:39 AM

lebedev.ri requested review of this revision.Mar 24 2021, 2:39 AM

lebedev.ri edited the summary of this revision. (Show Details)

lebedev.ri mentioned this in D99204: [LoopRotate] Hoist invariant loads.

lebedev.ri mentioned this in rG760f4c2069d5: [NFC][PhaseOrdering] Add a testcase for additional LICM before LoopRotate….Mar 24 2021, 3:24 AM

Added proper PhaseOrdering test reduced from @thopre's
http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

lebedev.ri edited the summary of this revision. (Show Details)Mar 24 2021, 3:29 AM

lebedev.ri edited the summary of this revision. (Show Details)Mar 24 2021, 3:41 AM

lebedev.ri edited the summary of this revision. (Show Details)Mar 24 2021, 4:19 AM

Thanks for working on that. I've tried this patch with the application from which I created the reduced testcase sent to the mailing list and that does fix the extra peeling. Thank you very much. I'll try to review the patch now

In D99249#2647248, @thopre wrote:

Thanks for working on that. I've tried this patch with the application from which I created the reduced testcase sent to the mailing list and that does fix the extra peeling. Thank you very much. I'll try to review the patch now

Glad to hear it!

LGTM

llvm/test/Other/pass-pipelines.ll
54	Shouldn't that CHECK-NOT be duplicated between each Loop Pass Manager check below as well?

lebedev.ri added inline comments.Mar 24 2021, 5:17 AM

llvm/test/Other/pass-pipelines.ll
54	I don't know. This seems to be consistent with the previous checks, where there already were several LPM's, but no CHECK-NOT inbetween.

Just marking this to clarify that approval by @thopre is insufficient. Don't have time to review this right now.

This revision now requires changes to proceed.Mar 24 2021, 5:21 AM

In D99249#2647330, @nikic wrote:

Just marking this to clarify that approval by @thopre is insufficient. Don't have time to review this right now.

Indeed, I didn't approve it intentionally

llvm/test/Other/pass-pipelines.ll
54	One of them was added a year ago by a different person than the author of the CHECK-NOT. I presume it was probably missed. Does the test pass if you add more CHECK-NOT?

One thing I can throw out now is that currently, we use split LPMs for LoopRotate and LICM, because historically it was too expensive to preserve MSSA in LoopRotate. After your change MSSA needs to be preserved anyway, and we are in the rather silly situation of LICM, LoopRotate and LICM again each using a separate LPM. If we want to go down this route, we'd probably want to land something like D74640 first in order to merge these LPMs again.

In D99249#2647352, @thopre wrote:

In D99249#2647330, @nikic wrote:

Just marking this to clarify that approval by @thopre is insufficient. Don't have time to review this right now.

Indeed, I didn't approve it intentionally

Yep, not sure why that needed to be stated explicitly.

llvm/test/Other/pass-pipelines.ll
54	Hm, seems to work.

In D99249#2647367, @nikic wrote:

One thing I can throw out now is that currently, we use split LPMs for LoopRotate and LICM, because historically it was too expensive to preserve MSSA in LoopRotate. After your change MSSA needs to be preserved anyway, and we are in the rather silly situation of LICM, LoopRotate and LICM again each using a separate LPM. If we want to go down this route, we'd probably want to land something like D74640 first in order to merge these LPMs again.

Notably, D74640 is an obsolete-pm problem. I'm not sure how much we should care about it nowadays.
But yes, D74640 does help there somewhat: without vs. with

Harbormaster completed remote builds in B95437: Diff 332901.Mar 24 2021, 7:25 AM

Harbormaster completed remote builds in B95449: Diff 332924.Mar 24 2021, 8:42 AM

mkazantsev added inline comments.Mar 24 2021, 10:12 AM

llvm/lib/Passes/PassBuilder.cpp
579	Theroretically LICM should move all invariants out of loop, and loop rotation should not create any new invariants. Do we really need this again?

thopre added inline comments.Mar 24 2021, 10:19 AM

llvm/lib/Passes/PassBuilder.cpp
579	Load of size >4 whose base pointer is derived from the `this` pointer fail the isSafeToExecuteUnconditionally() in LICM because the alignment of the `this` is 1.

Harbormaster completed remote builds in B95463: Diff 332949.Mar 24 2021, 10:23 AM

lebedev.ri edited the summary of this revision. (Show Details)Mar 24 2021, 10:29 AM

I.e. we end up with less instructions, more LICM activity (+30% more sunks out of loops!),

This sounds too good to be true ... and it is. The statistic was closer to counting sinking candidates than actually sunk instructions. I've fixed it in 8a168d2d70678164004fca8de78e98bfb6e1272d and would expect this patch to have a much more limited impact on it now.

lebedev.ri marked 2 inline comments as done.Mar 24 2021, 10:34 AM

lebedev.ri added inline comments.

llvm/lib/Passes/PassBuilder.cpp
579	Theroretically LICM should move all invariants out of loop, and loop rotation should not create any new invariants. Do we really need this again? I think the numbers speak for themselves, see 3'rd table in the description i just added that shows improvements when going from `LICM before LoopRotate` to `LICM after LoopRotate`. Especially, note the `licm.NumSunk` (+30%), and note that there was no such regression when just moving the LICM to before LoopRotate.

Depending on what the run-time impact is in practice, the additional compile-time may or may not be worth it.
I'm running some additional testing offline on this, please give me a couple of days to test and analyze.

In D99249#2648214, @nikic wrote:

I.e. we end up with less instructions, more LICM activity (+30% more sunks out of loops!),

This sounds too good to be true ... and it is. The statistic was closer to counting sinking candidates than actually sunk instructions. I've fixed it in 8a168d2d70678164004fca8de78e98bfb6e1272d and would expect this patch to have a much more limited impact on it now.

Sure, let me remeasure this.

thopre added inline comments.Mar 24 2021, 10:40 AM

llvm/lib/Passes/PassBuilder.cpp
579	Sorry I misread, you meant moving LICM before rather than adding a new LICM pass before.

lebedev.ri edited the summary of this revision. (Show Details)Mar 24 2021, 11:19 AM

In D99249#2648214, @nikic wrote:

I.e. we end up with less instructions, more LICM activity (+30% more sunks out of loops!),

This sounds too good to be true ... and it is. The statistic was closer to counting sinking candidates than actually sunk instructions. I've fixed it in 8a168d2d70678164004fca8de78e98bfb6e1272d and would expect this patch to have a much more limited impact on it now.

Down to 7%.
Also, note that stats suggest that early LICM is worse than two LICM's:

In D99249#2647162, @lebedev.ri wrote:

<> ... but that actually regresses LICM (-12% licm.NumMovedLoads),
loop-simplifycfg (NumLoopExitsDeleted, NumTerminatorsFolded),
simple-loop-unswitch (NumTrivial).

! In D99249#2647162, @lebedev.ri wrote:

also note how none of those 4 regressions are here.

So i still believe this is what we need.

shchenz added a subscriber: shchenz.Mar 25 2021, 2:25 AM

Looks nice for me, but let's wait for Alina's investigation to get concluded.

llvm/lib/Passes/PassBuilder.cpp
579	I'm not sure of change of `licm.NumSunk` statistic either way is a regression or not. Its increase can either be "we somehow enabled more optimizations" or "we duplicated code and now sink duplicated values twice". Thinking more about it, I can imagine how rotate can turn a Phi into a sinkable loop invariant. So I guess it's fine to keep LICM after it.

In D99249#2652350, @mkazantsev wrote:

Looks nice for me, but let's wait for Alina's investigation to get concluded.

@mkazantsev thank you for taking a look!

llvm/lib/Passes/PassBuilder.cpp
579	Looking at `asm-printer.EmittedInsts` and `instcount.TotalInsts`, we quite clearly end up with less instructions in either of the `LICM-LoopRotate` and `LICM-LoopRotate-LICM` (as compared to the baseline of `LoopRotate-LICM`), and `LICM-LoopRotate-LICM` is clearly a winner overall. Which is why that is the config i'm proposing here.

If only we could re-run LICM on the rotated loops :)

In D99249#2652556, @fhahn wrote:

If only we could re-run LICM on the rotated loops :)

Yep, @MaskRay has also been saying that in my pipeline patches :)

In D99249#2648226, @asbirlea wrote:

Depending on what the run-time impact is in practice, the additional compile-time may or may not be worth it.
I'm running some additional testing offline on this, please give me a couple of days to test and analyze.

@asbirlea How it's going?

Most performance results look ok. There are a few that are mixed (+ & -), one regression that's still investigated that so far seems architecture specific and not related to this patch, a couple of noticeable improvements and a few marginal improvements.
Please note I have only tested the new pass manager.
So, green light from my side assuming the other reviewers are also on board with this.

@lebedev.ri @thopre Is it right that you have seen the performance gain in practice from this change in larger applications, such as the example with peeling that prompted the first patch?
I'd be curious what the impact for that is.

In D99249#2661952, @asbirlea wrote:

Most performance results look ok. There are a few that are mixed (+ & -), one regression that's still investigated that so far seems architecture specific and not related to this patch, a couple of noticeable improvements and a few marginal improvements.
Please note I have only tested the new pass manager.
So, green light from my side assuming the other reviewers are also on board with this.

Thanks for running the tests. Sounds like the effect here is relatively minor, but at least it's not negative.

I think at this point there is a pretty clear motivation for why we want to run LICM before LoopRotate, but the motivation for the LICM-LoopRotate-LICM sequence is still somewhat murky to me. Do we have any PhaseOrdering tests that would regress without the second LICM run?

I'm not sure how to write proper interestingness test for this, but e.g. given this input

reduced.ll15 KBDownload

(bad example, in the end everything optimizes away)
currently (-looprotate -licm):

*** IR Dump After LICMPass ***
; Preheader:
for.end.lr.ph:                                    ; preds = %entry
  %hue = getelementptr inbounds %"class.rawspeed::Cr2sRawInterpolator", %"class.rawspeed::Cr2sRawInterpolator"* %this, i64 0, i32 3
  %1 = load i32, i32* %hue, align 4, !tbaa !8
  %Cb.i.i = getelementptr inbounds %"struct.std::array.30", %"struct.std::array.30"* %MCUs, i64 0, i32 0, i64 undef, i32 0, i64 undef, i32 1
  %sub.i.i = add i32 %1, -16384
  %Cb.i.i.promoted = load i32, i32* %Cb.i.i, align 4, !tbaa !9
  br label %for.end

; Loop:
for.end:                                          ; preds = %for.end.lr.ph, %for.end
  %2 = phi i32 [ %Cb.i.i.promoted, %for.end.lr.ph ], [ %add.i.i, %for.end ]
  %MCUIdx.03 = phi i32 [ undef, %for.end.lr.ph ], [ %inc21, %for.end ]
  %add.i.i = add i32 %sub.i.i, %2
  %inc21 = add nsw i32 %MCUIdx.03, 1
  %cmp = icmp slt i32 %inc21, %sub
  br i1 %cmp, label %for.end, label %for.cond.for.end22_crit_edge, !llvm.loop !11

; Exit blocks
for.cond.for.end22_crit_edge:                     ; preds = %for.end
  %add.i.i.lcssa = phi i32 [ %add.i.i, %for.end ]
  store i32 %add.i.i.lcssa, i32* %Cb.i.i, align 4, !tbaa !9
  br label %for.end22

with -licm -loop-rotate:

*** IR Dump After LoopRotatePass ***
; Preheader:
for.end.lr.ph:                                    ; preds = %entry
  br label %for.end

; Loop:
for.end:                                          ; preds = %for.end.lr.ph, %for.end
  %MCUIdx.03 = phi i32 [ undef, %for.end.lr.ph ], [ %inc21, %for.end ]
  %1 = load i32, i32* %hue, align 4, !tbaa !8
  %2 = load i32, i32* %Cb.i.i, align 4, !tbaa !9
  %sub.i.i = add i32 %1, -16384
  %add.i.i = add i32 %sub.i.i, %2
  store i32 %add.i.i, i32* %Cb.i.i, align 4, !tbaa !9
  %inc21 = add nsw i32 %MCUIdx.03, 1
  %cmp = icmp slt i32 %inc21, %sub
  br i1 %cmp, label %for.end, label %for.cond.for.end22_crit_edge, !llvm.loop !11

; Exit blocks
for.cond.for.end22_crit_edge:                     ; preds = %for.end
  br label %for.end22

and with -licm -loop-rotate -licm:

*** IR Dump After LICMPass ***
; Preheader:
for.end.lr.ph:                                    ; preds = %entry
  %1 = load i32, i32* %hue, align 4, !tbaa !8
  %sub.i.i = add i32 %1, -16384
  %Cb.i.i.promoted = load i32, i32* %Cb.i.i, align 4, !tbaa !9
  br label %for.end

; Loop:
for.end:                                          ; preds = %for.end.lr.ph, %for.end
  %2 = phi i32 [ %Cb.i.i.promoted, %for.end.lr.ph ], [ %add.i.i, %for.end ]
  %MCUIdx.03 = phi i32 [ undef, %for.end.lr.ph ], [ %inc21, %for.end ]
  %add.i.i = add i32 %sub.i.i, %2
  %inc21 = add nsw i32 %MCUIdx.03, 1
  %cmp = icmp slt i32 %inc21, %sub
  br i1 %cmp, label %for.end, label %for.cond.for.end22_crit_edge, !llvm.loop !11

; Exit blocks
for.cond.for.end22_crit_edge:                     ; preds = %for.end
  %add.i.i.lcssa = phi i32 [ %add.i.i, %for.end ]
  store i32 %add.i.i.lcssa, i32* %Cb.i.i, align 4, !tbaa !9
  br label %for.end22

Note how the -licm -loop-rotate causes %sub.i.i to be computed in-loop.
Do tell if that isn't sufficient of an answer.

In D99249#2661952, @asbirlea wrote:

Most performance results look ok. There are a few that are mixed (+ & -), one regression that's still investigated that so far seems architecture specific and not related to this patch, a couple of noticeable improvements and a few marginal improvements.
Please note I have only tested the new pass manager.
So, green light from my side assuming the other reviewers are also on board with this.

@lebedev.ri @thopre Is it right that you have seen the performance gain in practice from this change in larger applications, such as the example with peeling that prompted the first patch?

Yes, we had a ML benchmark regress by about 4% due to the peeling preventing vectorization due to loss of alignment on our target. I've tried both LICM-looprotate-LICM and LICM-looprotate and both fix the regression.

In D99249#2662288, @lebedev.ri wrote:

I'm not sure how to write proper interestingness test for this, but e.g. given this input
reduced.ll15 KBDownload
(bad example, in the end everything optimizes away)
<...>
Note how the -licm -loop-rotate causes %sub.i.i to be computed in-loop.
Do tell if that isn't sufficient of an answer.

Or more visibly: https://godbolt.org/z/GzEbacs4K

To the best of my knowledge, all review feedback has been addressed here.
I will be landing this tomorrow, if nothing new blocking appears before then.
Thanks!

Here's a reduced version of @lebedev.ri's example:

define void @test(i32* dereferenceable(4) %ptr, i32 %size) {
entry:                                     
  br label %for.cond
    
for.cond:
  %iv = phi i32 [ 0, %entry ], [ %iv.next, %for.cont ]
  %cmp = icmp ult i32 %iv, %size
  br i1 %cmp, label %for.cont, label %for.end

for.cont:
  %val = load i32, i32* %ptr, align 4
  %iv.next = add nsw i32 %iv, 1
  call void @use(i32 %val) inaccessiblememonly
  br label %for.cond

for.end:
  ret void
}

declare void @use(i32)

The problem is that the load is not speculatable due to lack of align attribute. After -loop-rotate we have

define void @test(i32* align 4 dereferenceable(4) %ptr, i32 %size) {
entry:
  %cmp1 = icmp ult i32 0, %size
  br i1 %cmp1, label %for.cont.lr.ph, label %for.end

for.cont.lr.ph:                                   ; preds = %entry
  br label %for.cont

for.cont:                                         ; preds = %for.cont.lr.ph, %for.cont
  %iv2 = phi i32 [ 0, %for.cont.lr.ph ], [ %iv.next, %for.cont ]
  %val = load i32, i32* %ptr, align 4
  %iv.next = add nsw i32 %iv2, 1
  call void @use(i32 %val) #0
  %cmp = icmp ult i32 %iv.next, %size
  br i1 %cmp, label %for.cont, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cont
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

in which case the load is executed unconditionally (if the loop is executed) and can be hoisted. Thus the need for -licm after -loop-rotate.

Based on @thopre's comments, it seems like lack of align attribute on C++ this pointers was also the problem for his case. It's possible that I'm missing some C++ subtlety here, but I believe the lack of align attribute here is just an oversight. At some point, we changed LLVM to no longer assume that pointers without alignment information have natural alignment, and clang was adjusted in D80166 to emit align attributes for dereferenceable pointers to compensate for that change. However, the align attributes were added only for reference types, not for this pointers. Notably https://github.com/llvm/llvm-project/blob/17800f900dca8243773dec5f90578cce03069b8f/clang/lib/CodeGen/CGCall.cpp#L2310 is missing the align attribute. We're probably missing optimizations not just in LICM because of that.

Yep, that looks suspect.
I'll see if i can follow-up on that.
Thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Apr 2 2021, 1:12 AM

Closed by commit rGa26f1bf67ec7: [PassManager] Run additional LICM before LoopRotate (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rGa26f1bf67ec7: [PassManager] Run additional LICM before LoopRotate.

lebedev.ri mentioned this in D99790: [CGCall] Annotate `this` argument with alignment.Apr 2 2021, 1:44 AM

lebedev.ri mentioned this in D99791: [CGCall] Annotate pointer argument with alignment.Apr 2 2021, 2:05 AM

nikic mentioned this in D99843: [LoopRotate] Don't split loop pass manager.Apr 3 2021, 12:02 PM

lebedev.ri mentioned this in rG0aa0458f1429: [CGCall] Annotate `this` argument with alignment.Apr 7 2021, 1:02 AM

lebedev.ri mentioned this in D100037: [clang][UBSan] Passing underaligned pointer as a function argument is undefined behaviour.Apr 7 2021, 7:48 AM

nikic mentioned this in rG59a2f67011ba: [LoopRotate] Don't split loop pass manager.Apr 8 2021, 1:05 PM

Maybe this caused https://bugs.llvm.org/show_bug.cgi?id=50550. Could you check and comment it?

Herald added a subscriber: ormris. · View Herald TranscriptJun 12 2021, 3:33 AM

nikic mentioned this in D104180: [LICM] Create LoopNest Invariant Code Motion (LNICM) pass.Jul 18 2021, 11:40 AM

FYI, this appears to introduce a somewhat serious performance regression for NVPTX.
https://bugs.llvm.org/show_bug.cgi?id=52037

It's not necessarily a bug, but rather an unfortunate interaction between loop transforms blocking SROA which affects performance-sensitive quirk in NVPTX, where local storage is actually quite expensive.

wsmoses mentioned this in D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.Feb 16 2022, 11:37 AM

lebedev.ri added a reverting change: D119975: Revert "[PassManager] Run additional LICM before LoopRotate".Feb 16 2022, 1:34 PM

nikic mentioned this in D119975: Revert "[PassManager] Run additional LICM before LoopRotate".Feb 17 2022, 12:34 AM

wsmoses mentioned this in rGd9da6a535f21: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.Feb 17 2022, 5:13 PM

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilder.cpp

10 lines

Transforms/

IPO/

PassManagerBuilder.cpp

4 lines

test/

CodeGen/

AMDGPU/

opt-pipeline.ll

30 lines

Other/

new-pm-defaults.ll

7 lines

new-pm-thinlto-defaults.ll

7 lines

new-pm-thinlto-postlink-pgo-defaults.ll

9 lines

new-pm-thinlto-postlink-samplepgo-defaults.ll

7 lines

new-pm-thinlto-prelink-pgo-defaults.ll

9 lines

new-pm-thinlto-prelink-samplepgo-defaults.ll

5 lines

opt-O2-pipeline.ll

10 lines

opt-O3-pipeline-enable-matrix.ll

10 lines

opt-O3-pipeline.ll

10 lines

opt-Os-pipeline.ll

10 lines

pass-pipelines.ll

3 lines

Transforms/

IndVarSimplify/

X86/

pr45360.ll

25 lines

PhaseOrdering/

X86/

spurious-peeling.ll

87 lines

vdiv.ll

78 lines

loop-rotation-vs-common-code-hoisting.ll

22 lines

Diff 334916

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 562 Lines • ▼ Show 20 Lines	PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
LoopPassManager LPM1(DebugLogging), LPM2(DebugLogging);		LoopPassManager LPM1(DebugLogging), LPM2(DebugLogging);

// Simplify the loop body. We do this initially to clean up after other loop		// Simplify the loop body. We do this initially to clean up after other loop
// passes run, either when iterating on a loop or on inner loops with		// passes run, either when iterating on a loop or on inner loops with
// implications on the outer loop.		// implications on the outer loop.
LPM1.addPass(LoopInstSimplifyPass());		LPM1.addPass(LoopInstSimplifyPass());
LPM1.addPass(LoopSimplifyCFGPass());		LPM1.addPass(LoopSimplifyCFGPass());

		// Try to remove as much code from the loop header as possible,
		// to reduce amount of IR that will have to be duplicated.
		// TODO: Investigate promotion cap for O1.
		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));

LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true,		LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true,
isLTOPreLink(Phase)));		isLTOPreLink(Phase)));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));
		mkazantsevUnsubmitted Done Reply Inline Actions Theroretically LICM should move all invariants out of loop, and loop rotation should not create any new invariants. Do we really need this again? mkazantsev: Theroretically LICM should move all invariants out of loop, and loop rotation should not create…
		thopreUnsubmitted Done Reply Inline Actions Load of size >4 whose base pointer is derived from the `this` pointer fail the isSafeToExecuteUnconditionally() in LICM because the alignment of the `this` is 1. thopre: Load of size >4 whose base pointer is derived from the `this` pointer fail the…
		thopreUnsubmitted Done Reply Inline Actions Sorry I misread, you meant moving LICM before rather than adding a new LICM pass before. thopre: Sorry I misread, you meant moving LICM before rather than adding a new LICM pass before.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Theroretically LICM should move all invariants out of loop, and loop rotation should not create any new invariants. Do we really need this again? I think the numbers speak for themselves, see 3'rd table in the description i just added that shows improvements when going from `LICM before LoopRotate` to `LICM after LoopRotate`. Especially, note the `licm.NumSunk` (+30%), and note that there was no such regression when just moving the LICM to before LoopRotate. lebedev.ri: > Theroretically LICM should move all invariants out of loop, and loop rotation should not…
		mkazantsevUnsubmitted Not Done Reply Inline Actions I'm not sure of change of `licm.NumSunk` statistic either way is a regression or not. Its increase can either be "we somehow enabled more optimizations" or "we duplicated code and now sink duplicated values twice". Thinking more about it, I can imagine how rotate can turn a Phi into a sinkable loop invariant. So I guess it's fine to keep LICM after it. mkazantsev: I'm not sure of change of `licm.NumSunk` statistic either way is a regression or not. Its…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Looking at `asm-printer.EmittedInsts` and `instcount.TotalInsts`, we quite clearly end up with less instructions in either of the `LICM-LoopRotate` and `LICM-LoopRotate-LICM` (as compared to the baseline of `LoopRotate-LICM`), and `LICM-LoopRotate-LICM` is clearly a winner overall. Which is why that is the config i'm proposing here. lebedev.ri: Looking at `asm-printer.EmittedInsts` and `instcount.TotalInsts`, we quite clearly end up with…
LPM1.addPass(SimpleLoopUnswitchPass());		LPM1.addPass(SimpleLoopUnswitchPass());

LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());
LPM2.addPass(IndVarSimplifyPass());		LPM2.addPass(IndVarSimplifyPass());

for (auto &C : LateLoopOptimizationsEPCallbacks)		for (auto &C : LateLoopOptimizationsEPCallbacks)
C(LPM2, Level);		C(LPM2, Level);

▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
LoopPassManager LPM1(DebugLogging), LPM2(DebugLogging);		LoopPassManager LPM1(DebugLogging), LPM2(DebugLogging);

// Simplify the loop body. We do this initially to clean up after other loop		// Simplify the loop body. We do this initially to clean up after other loop
// passes run, either when iterating on a loop or on inner loops with		// passes run, either when iterating on a loop or on inner loops with
// implications on the outer loop.		// implications on the outer loop.
LPM1.addPass(LoopInstSimplifyPass());		LPM1.addPass(LoopInstSimplifyPass());
LPM1.addPass(LoopSimplifyCFGPass());		LPM1.addPass(LoopSimplifyCFGPass());

		// Try to remove as much code from the loop header as possible,
		// to reduce amount of IR that will have to be duplicated.
		// TODO: Investigate promotion cap for O1.
		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));

// Disable header duplication in loop rotation at -Oz.		// Disable header duplication in loop rotation at -Oz.
LPM1.addPass(		LPM1.addPass(
LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));		LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));
LPM1.addPass(		LPM1.addPass(
SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&		SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&
EnableO3NonTrivialUnswitching));		EnableO3NonTrivialUnswitching));
▲ Show 20 Lines • Show All 2,427 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 425 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addFunctionSimplificationPasses(
// Begin the loop pass pipeline.		// Begin the loop pass pipeline.
if (EnableSimpleLoopUnswitch) {		if (EnableSimpleLoopUnswitch) {
// The simple loop unswitch pass relies on separate cleanup passes. Schedule		// The simple loop unswitch pass relies on separate cleanup passes. Schedule
// them first so when we re-process a loop they run before other loop		// them first so when we re-process a loop they run before other loop
// passes.		// passes.
MPM.add(createLoopInstSimplifyPass());		MPM.add(createLoopInstSimplifyPass());
MPM.add(createLoopSimplifyCFGPass());		MPM.add(createLoopSimplifyCFGPass());
}		}
		// Try to remove as much code from the loop header as possible,
		// to reduce amount of IR that will have to be duplicated.
		// TODO: Investigate promotion cap for O1.
		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
// Rotate Loop - disable header duplication at -Oz		// Rotate Loop - disable header duplication at -Oz
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
if (EnableSimpleLoopUnswitch)		if (EnableSimpleLoopUnswitch)
MPM.add(createSimpleLoopUnswitchLegacyPass());		MPM.add(createSimpleLoopUnswitchLegacyPass());
else		else
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));
▲ Show 20 Lines • Show All 847 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Block Frequency Analysis			; GCN-O1-NEXT: Block Frequency Analysis
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Optimization Remark Emitter			; GCN-O1-NEXT: Optimization Remark Emitter
	; GCN-O1-NEXT: PGOMemOPSize			; GCN-O1-NEXT: PGOMemOPSize
	; GCN-O1-NEXT: Simplify the CFG			; GCN-O1-NEXT: Simplify the CFG
	; GCN-O1-NEXT: Reassociate expressions			; GCN-O1-NEXT: Reassociate expressions
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
				; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O1-NEXT: Function Alias Analysis Results
				; GCN-O1-NEXT: Memory SSA
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Canonicalize natural loops			; GCN-O1-NEXT: Canonicalize natural loops
	; GCN-O1-NEXT: LCSSA Verifier			; GCN-O1-NEXT: LCSSA Verifier
	; GCN-O1-NEXT: Loop-Closed SSA Form Pass			; GCN-O1-NEXT: Loop-Closed SSA Form Pass
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Scalar Evolution Analysis			; GCN-O1-NEXT: Scalar Evolution Analysis
				; GCN-O1-NEXT: Lazy Branch Probability Analysis
				; GCN-O1-NEXT: Lazy Block Frequency Analysis
				; GCN-O1-NEXT: Loop Pass Manager
				; GCN-O1-NEXT: Loop Invariant Code Motion
	; GCN-O1-NEXT: Loop Pass Manager			; GCN-O1-NEXT: Loop Pass Manager
	; GCN-O1-NEXT: Rotate Loops			; GCN-O1-NEXT: Rotate Loops
	; GCN-O1-NEXT: Memory SSA
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Loop Pass Manager			; GCN-O1-NEXT: Loop Pass Manager
	; GCN-O1-NEXT: Loop Invariant Code Motion			; GCN-O1-NEXT: Loop Invariant Code Motion
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Legacy Divergence Analysis			; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: Loop Pass Manager			; GCN-O1-NEXT: Loop Pass Manager
	; GCN-O1-NEXT: Unswitch loops			; GCN-O1-NEXT: Unswitch loops
	▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Optimization Remark Emitter			; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Tail Call Elimination			; GCN-O2-NEXT: Tail Call Elimination
	; GCN-O2-NEXT: Simplify the CFG			; GCN-O2-NEXT: Simplify the CFG
	; GCN-O2-NEXT: Reassociate expressions			; GCN-O2-NEXT: Reassociate expressions
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
				; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O2-NEXT: Function Alias Analysis Results
				; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Canonicalize natural loops			; GCN-O2-NEXT: Canonicalize natural loops
	; GCN-O2-NEXT: LCSSA Verifier			; GCN-O2-NEXT: LCSSA Verifier
	; GCN-O2-NEXT: Loop-Closed SSA Form Pass			; GCN-O2-NEXT: Loop-Closed SSA Form Pass
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
				; GCN-O2-NEXT: Lazy Branch Probability Analysis
				; GCN-O2-NEXT: Lazy Block Frequency Analysis
				; GCN-O2-NEXT: Loop Pass Manager
				; GCN-O2-NEXT: Loop Invariant Code Motion
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
	; GCN-O2-NEXT: Rotate Loops			; GCN-O2-NEXT: Rotate Loops
	; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
	; GCN-O2-NEXT: Loop Invariant Code Motion			; GCN-O2-NEXT: Loop Invariant Code Motion
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Legacy Divergence Analysis			; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
	; GCN-O2-NEXT: Unswitch loops			; GCN-O2-NEXT: Unswitch loops
	▲ Show 20 Lines • Show All 333 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Tail Call Elimination			; GCN-O3-NEXT: Tail Call Elimination
	; GCN-O3-NEXT: Simplify the CFG			; GCN-O3-NEXT: Simplify the CFG
	; GCN-O3-NEXT: Reassociate expressions			; GCN-O3-NEXT: Reassociate expressions
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
				; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O3-NEXT: Function Alias Analysis Results
				; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Canonicalize natural loops			; GCN-O3-NEXT: Canonicalize natural loops
	; GCN-O3-NEXT: LCSSA Verifier			; GCN-O3-NEXT: LCSSA Verifier
	; GCN-O3-NEXT: Loop-Closed SSA Form Pass			; GCN-O3-NEXT: Loop-Closed SSA Form Pass
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
				; GCN-O3-NEXT: Lazy Branch Probability Analysis
				; GCN-O3-NEXT: Lazy Block Frequency Analysis
				; GCN-O3-NEXT: Loop Pass Manager
				; GCN-O3-NEXT: Loop Invariant Code Motion
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
	; GCN-O3-NEXT: Rotate Loops			; GCN-O3-NEXT: Rotate Loops
	; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
	; GCN-O3-NEXT: Loop Invariant Code Motion			; GCN-O3-NEXT: Loop Invariant Code Motion
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Legacy Divergence Analysis			; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
	; GCN-O3-NEXT: Unswitch loops			; GCN-O3-NEXT: Unswitch loops
	▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass			; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
	; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis			; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
	; CHECK-O-NEXT: Starting llvm::Module pass manager run.			; CHECK-O-NEXT: Starting llvm::Module pass manager run.
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running analysis: GlobalsAA			; CHECK-O-NEXT: Running analysis: GlobalsAA
	; CHECK-O-NEXT: Running analysis: CallGraphAnalysis			; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis			; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis			; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
	; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy			; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>
	; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass			; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
	; CHECK-O-NEXT: Starting CGSCC pass manager run.			; CHECK-O-NEXT: Starting CGSCC pass manager run.
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass			; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
	; CHECK-O2-NEXT: Running pass: OpenMPOptPass on (foo)			; CHECK-O2-NEXT: Running pass: OpenMPOptPass on (foo)
	Show All 24 Lines
	; CHECK-O-NEXT: Running analysis: LoopAnalysis			; CHECK-O-NEXT: Running analysis: LoopAnalysis
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis			; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Starting Loop pass manager run.			; CHECK-O-NEXT: Starting Loop pass manager run.
	; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass			; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass			; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
				; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Finished Loop pass manager run.			; CHECK-O-NEXT: Finished Loop pass manager run.
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass			; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
	; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis			; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
	; CHECK-O-NEXT: Starting llvm::Module pass manager run.			; CHECK-O-NEXT: Starting llvm::Module pass manager run.
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running analysis: GlobalsAA			; CHECK-O-NEXT: Running analysis: GlobalsAA
	; CHECK-O-NEXT: Running analysis: CallGraphAnalysis			; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis			; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis			; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
	; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy			; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
	; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass			; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
	; CHECK-O-NEXT: Starting CGSCC pass manager run.			; CHECK-O-NEXT: Starting CGSCC pass manager run.
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
	; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass			; CHECK-O3-NEXT: Running pass: ArgumentPromotionPass
	; CHECK-O2-NEXT: Running pass: OpenMPOptPass on (foo)			; CHECK-O2-NEXT: Running pass: OpenMPOptPass on (foo)
	Show All 22 Lines
	; CHECK-O-NEXT: Running analysis: LoopAnalysis			; CHECK-O-NEXT: Running analysis: LoopAnalysis
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run			; CHECK-O-NEXT: Finished llvm::Function pass manager run
	; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis			; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Starting Loop pass manager run.			; CHECK-O-NEXT: Starting Loop pass manager run.
	; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass			; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass			; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
				; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Finished Loop pass manager run.			; CHECK-O-NEXT: Finished Loop pass manager run.
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CHECK-O-DAG: Running analysis: LoopAnalysis on foo			; CHECK-O-DAG: Running analysis: LoopAnalysis on foo
	; CHECK-O-DAG: Running analysis: BranchProbabilityAnalysis on foo			; CHECK-O-DAG: Running analysis: BranchProbabilityAnalysis on foo
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run.			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run.
	; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass			; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
	; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis			; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
	; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.			; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running analysis: GlobalsAA			; CHECK-O-NEXT: Running analysis: GlobalsAA
	; CHECK-O-NEXT: Running analysis: CallGraphAnalysis			; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis			; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
	; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy			; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>
	; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass			; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
	; CHECK-O-NEXT: Starting CGSCC pass manager run.			; CHECK-O-NEXT: Starting CGSCC pass manager run.
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	Show All 24 Lines
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis			; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Starting Loop pass manager run.			; CHECK-O-NEXT: Starting Loop pass manager run.
	; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass			; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass			; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
				; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Finished Loop pass manager run.			; CHECK-O-NEXT: Finished Loop pass manager run.
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK-O-DAG: Running analysis: BranchProbabilityAnalysis on foo			; CHECK-O-DAG: Running analysis: BranchProbabilityAnalysis on foo
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo			; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run

	; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass			; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
	; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis			; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
	; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.			; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running analysis: GlobalsAA			; CHECK-O-NEXT: Running analysis: GlobalsAA
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis			; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
	; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy			; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
	; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass			; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
	; CHECK-O-NEXT: Starting CGSCC pass manager run.			; CHECK-O-NEXT: Starting CGSCC pass manager run.
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	Show All 24 Lines
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis			; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Starting Loop pass manager run.			; CHECK-O-NEXT: Starting Loop pass manager run.
	; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass			; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass			; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
				; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Finished Loop pass manager run.			; CHECK-O-NEXT: Finished Loop pass manager run.
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; CHECK-O123SZ-NEXT: Invalidating analysis: LazyCallGraphAnalysis on			; CHECK-O123SZ-NEXT: Invalidating analysis: LazyCallGraphAnalysis on
	; CHECK-O123SZ-NEXT: Invalidating analysis: InnerAnalysisManagerProxy			; CHECK-O123SZ-NEXT: Invalidating analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running pass: PGOIndirectCallPromotion on			; CHECK-O-NEXT: Running pass: PGOIndirectCallPromotion on
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo			; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
	; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass			; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
	; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.			; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running analysis: GlobalsAA			; CHECK-O-NEXT: Running analysis: GlobalsAA
	; CHECK-O-NEXT: Running analysis: CallGraphAnalysis			; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis			; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
	; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo			; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo
	; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy			; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>
	; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass			; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
	; CHECK-O-NEXT: Starting CGSCC pass manager run.			; CHECK-O-NEXT: Starting CGSCC pass manager run.
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis			; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Starting Loop pass manager run.			; CHECK-O-NEXT: Starting Loop pass manager run.
	; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass			; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass			; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
				; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Finished Loop pass manager run.			; CHECK-O-NEXT: Finished Loop pass manager run.
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; CHECK-O-DAG: Running analysis: LoopAnalysis on foo			; CHECK-O-DAG: Running analysis: LoopAnalysis on foo
	; CHECK-O-DAG: Running analysis: BranchProbabilityAnalysis on foo			; CHECK-O-DAG: Running analysis: BranchProbabilityAnalysis on foo
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo			; CHECK-O-NEXT: Running pass: SimplifyCFGPass on foo
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass			; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
	; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis			; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
	; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.			; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running analysis: GlobalsAA			; CHECK-O-NEXT: Running analysis: GlobalsAA
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis			; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
	; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy			; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.}}LazyCallGraph::SCC{{.}}>
	; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass			; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
	; CHECK-O-NEXT: Starting CGSCC pass manager run.			; CHECK-O-NEXT: Starting CGSCC pass manager run.
	; CHECK-O-NEXT: Running pass: InlinerPass			; CHECK-O-NEXT: Running pass: InlinerPass
	Show All 25 Lines
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished {{.*}}Function pass manager run			; CHECK-O-NEXT: Finished {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis			; CHECK-O-NEXT: Running analysis: ScalarEvolutionAnalysis
	; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O-NEXT: Starting Loop pass manager run.			; CHECK-O-NEXT: Starting Loop pass manager run.
	; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass			; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass			; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
				; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Finished Loop pass manager run.			; CHECK-O-NEXT: Finished Loop pass manager run.
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Starting {{.*}}Function pass manager run			; CHECK-O-NEXT: Starting {{.*}}Function pass manager run
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Tail Call Elimination			; CHECK-NEXT: Tail Call Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Reassociate expressions			; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
				; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
	; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Tail Call Elimination			; CHECK-NEXT: Tail Call Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Reassociate expressions			; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
				; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
	; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Tail Call Elimination			; CHECK-NEXT: Tail Call Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Reassociate expressions			; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
				; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
	; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Tail Call Elimination			; CHECK-NEXT: Tail Call Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Reassociate expressions			; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
				; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
	; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/test/Other/pass-pipelines.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-O2-NEXT: Call Graph SCC Pass Manager			; CHECK-O2-NEXT: Call Graph SCC Pass Manager
	; CHECK-O2-NEXT: Remove unused exception handling info			; CHECK-O2-NEXT: Remove unused exception handling info
	; CHECK-O2-NEXT: Function Integration/Inlining			; CHECK-O2-NEXT: Function Integration/Inlining
	; CHECK-O2-NEXT: OpenMP specific optimizations			; CHECK-O2-NEXT: OpenMP specific optimizations
	; CHECK-O2-NEXT: Deduce function attributes			; CHECK-O2-NEXT: Deduce function attributes
	; Next up is the main function pass pipeline. It shouldn't be split up and			; Next up is the main function pass pipeline. It shouldn't be split up and
	; should contain the main loop pass pipeline as well.			; should contain the main loop pass pipeline as well.
	; CHECK-O2-NEXT: FunctionPass Manager			; CHECK-O2-NEXT: FunctionPass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
				thopreUnsubmitted Done Reply Inline Actions Shouldn't that CHECK-NOT be duplicated between each Loop Pass Manager check below as well? thopre: Shouldn't that CHECK-NOT be duplicated between each Loop Pass Manager check below as well?
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions I don't know. This seems to be consistent with the previous checks, where there already were several LPM's, but no CHECK-NOT inbetween. lebedev.ri: I don't know. This seems to be consistent with the previous checks, where there already were…
				thopreUnsubmitted Done Reply Inline Actions One of them was added a year ago by a different person than the author of the CHECK-NOT. I presume it was probably missed. Does the test pass if you add more CHECK-NOT? thopre: One of them was added a year ago by a different person than the author of the CHECK-NOT. I…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Hm, seems to work. lebedev.ri: Hm, seems to work.
	; CHECK-O2: Loop Pass Manager			; CHECK-O2: Loop Pass Manager
				; CHECK-O2-NOT: Manager
				; CHECK-O2: Loop Pass Manager
				; CHECK-O2-NOT: Manager
	; CHECK-O2: Loop Pass Manager			; CHECK-O2: Loop Pass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and			; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and
	; causing new loop pass managers.			; causing new loop pass managers.
	; CHECK-O2: Simplify the CFG			; CHECK-O2: Simplify the CFG
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: Combine redundant instructions			; CHECK-O2: Combine redundant instructions
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll

	Show All 16 Lines
	@b = dso_local global i32 0, align 4			@b = dso_local global i32 0, align 4
	@e = dso_local global i32 0, align 4			@e = dso_local global i32 0, align 4

	define dso_local i32 @main() {			define dso_local i32 @main() {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[I6:%.]] = load i32, i32 @a, align 4			; CHECK-NEXT: [[I6:%.]] = load i32, i32 @a, align 4
	; CHECK-NEXT: [[I24:%.]] = load i32, i32 @b, align 4			; CHECK-NEXT: [[I24:%.]] = load i32, i32 @b, align 4
	; CHECK-NEXT: [[D_PROMOTED9:%.]] = load i32, i32 @d, align 4			; CHECK-NEXT: [[D_PROMOTED7:%.]] = load i32, i32 @d, align 4
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]]			; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]]
	; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0			; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.]], label [[BB27_PREHEADER:%.]]
	; CHECK: bb1:			; CHECK: bb27.preheader:
	; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.]], label [[BB19_PREHEADER:%.]]
	; CHECK: bb19.preheader:
	; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]]			; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]]
	; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4			; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4
	; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0			; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0
	; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]]			; CHECK-NEXT: br label [[BB27:%.*]]
	; CHECK: bb13.preheader.bb27.thread.split_crit_edge:			; CHECK: bb27.thread:
	; CHECK-NEXT: store i32 -1, i32* @f, align 4
	; CHECK-NEXT: store i32 0, i32* @d, align 4			; CHECK-NEXT: store i32 0, i32* @d, align 4
				; CHECK-NEXT: store i32 -1, i32* @f, align 4
	; CHECK-NEXT: store i32 0, i32* @c, align 4			; CHECK-NEXT: store i32 0, i32* @c, align 4
	; CHECK-NEXT: br label [[BB32:%.*]]			; CHECK-NEXT: br label [[BB32:%.*]]
				; CHECK: bb27:
				; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.]], label [[BB36:%.]]
	; CHECK: bb32.loopexit:			; CHECK: bb32.loopexit:
	; CHECK-NEXT: store i32 -1, i32* @f, align 4
	; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4			; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4
				; CHECK-NEXT: store i32 -1, i32* @f, align 4
	; CHECK-NEXT: br label [[BB32]]			; CHECK-NEXT: br label [[BB32]]
	; CHECK: bb32:			; CHECK: bb32:
	; CHECK-NEXT: [[C_SINK:%.]] = phi i32 [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ]			; CHECK-NEXT: [[C_SINK:%.]] = phi i32 [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ]
	; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4			; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
				; CHECK: bb36:
				; CHECK-NEXT: store i32 1, i32* @c, align 4
				; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]]
	;			;
	bb:			bb:
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store i32 0, i32* %i, align 4			store i32 0, i32* %i, align 4
	br label %bb1			br label %bb1

	bb1:			bb1:
	store i32 0, i32* @f, align 4			store i32 0, i32* @f, align 4
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll

	Show All 10 Lines

	$_ZN12FloatVecPair6vecIncEv = comdat any			$_ZN12FloatVecPair6vecIncEv = comdat any

	define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FVP) {			define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FVP) {
	; OLDPM-LABEL: @_Z13vecIncFromPtrP12FloatVecPair(			; OLDPM-LABEL: @_Z13vecIncFromPtrP12FloatVecPair(
	; OLDPM-NEXT: entry:			; OLDPM-NEXT: entry:
	; OLDPM-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0			; OLDPM-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0
	; OLDPM-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]			; OLDPM-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
	; OLDPM-NEXT: [[SIZE410_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1			; OLDPM-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
	; OLDPM-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]]			; OLDPM-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
	; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0			; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
	; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]			; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]
	; OLDPM: for.body7.lr.ph.i:			; OLDPM: for.body7.lr.ph.i:
	; OLDPM-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0			; OLDPM-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
	; OLDPM-NEXT: [[TMP2:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]]			; OLDPM-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0
	; OLDPM-NEXT: [[BASE_I2_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP2]], i64 undef, i32 0			; OLDPM-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; OLDPM-NEXT: [[TMP3:%.]] = load float, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]]			; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef
	; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.]] = getelementptr inbounds float, float [[TMP3]], i64 undef			; OLDPM-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]]
	; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0			; OLDPM-NEXT: [[BASE_I2_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0
	; OLDPM-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]]			; OLDPM-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]]
	; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef			; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef
	; OLDPM-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]]			; OLDPM-NEXT: [[DOTPRE_I:%.]] = load float, float [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
	; OLDPM-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]			; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]]
	; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]
	; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1
	; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]]
	; OLDPM: for.body7.i:			; OLDPM: for.body7.i:
	; OLDPM-NEXT: [[TMP7:%.]] = phi float [ [[ADD_I:%.]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ]			; OLDPM-NEXT: [[TMP5:%.]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.]], [[FOR_BODY7_I]] ]
	; OLDPM-NEXT: [[J_012_I:%.]] = phi i32 [ [[INC_I:%.]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ]			; OLDPM-NEXT: [[J_011_I:%.]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.]], [[FOR_BODY7_I]] ]
	; OLDPM-NEXT: [[TMP8:%.]] = load float, float [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]]			; OLDPM-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]]
	; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]]			; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]]
	; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]			; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]
	; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1			; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1
	; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]]			; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]]
	; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]]			; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]]
	; OLDPM: _ZN12FloatVecPair6vecIncEv.exit:			; OLDPM: _ZN12FloatVecPair6vecIncEv.exit:
	; OLDPM-NEXT: ret void			; OLDPM-NEXT: ret void
	;			;
	; NEWPM-LABEL: @_Z13vecIncFromPtrP12FloatVecPair(			; NEWPM-LABEL: @_Z13vecIncFromPtrP12FloatVecPair(
	; NEWPM-NEXT: entry:			; NEWPM-NEXT: entry:
	; NEWPM-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0			; NEWPM-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0
	; NEWPM-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]			; NEWPM-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
	; NEWPM-NEXT: [[SIZE410_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1			; NEWPM-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
	; NEWPM-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]]			; NEWPM-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
	; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0			; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
	; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]			; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]
	; NEWPM: for.body7.lr.ph.i:			; NEWPM: for.body7.lr.ph.i:
	; NEWPM-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0			; NEWPM-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
	; NEWPM-NEXT: [[TMP2:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]			; NEWPM-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0
	; NEWPM-NEXT: [[BASE_I8_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP2]], i64 undef, i32 0			; NEWPM-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; NEWPM-NEXT: [[TMP3:%.]] = load float, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]]			; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef
	; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.]] = getelementptr inbounds float, float [[TMP3]], i64 undef			; NEWPM-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]
	; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0			; NEWPM-NEXT: [[BASE_I8_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0
	; NEWPM-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]]			; NEWPM-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]]
	; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef			; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef
	; NEWPM-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]]			; NEWPM-NEXT: [[DOTPRE_I:%.]] = load float, float [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
	; NEWPM-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]
	; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]
	; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1
	; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]]
	; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge:
	; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1
	; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]]			; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]]
	; NEWPM: for.body7.i:			; NEWPM: for.body7.i:
	; NEWPM-NEXT: [[TMP7:%.]] = phi float [ [[ADD_I:%.]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ]			; NEWPM-NEXT: [[TMP5:%.]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.]], [[FOR_BODY7_I]] ]
	; NEWPM-NEXT: [[INC_I_PHI:%.]] = phi i32 [ [[INC_I_0:%.]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ]			; NEWPM-NEXT: [[J_011_I:%.]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.]], [[FOR_BODY7_I]] ]
	; NEWPM-NEXT: [[TMP8:%.]] = load float, float [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]]			; NEWPM-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]]
	; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]]			; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]]
	; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]			; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]
	; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]]			; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1
	; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]]			; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]]
	; NEWPM: for.body7.i.for.body7.i_crit_edge:			; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]]
	; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1
	; NEWPM-NEXT: br label [[FOR_BODY7_I]]
	; NEWPM: _ZN12FloatVecPair6vecIncEv.exit:			; NEWPM: _ZN12FloatVecPair6vecIncEv.exit:
	; NEWPM-NEXT: ret void			; NEWPM-NEXT: ret void
	;			;
	entry:			entry:
	%FVP.addr = alloca %class.FloatVecPair*, align 8			%FVP.addr = alloca %class.FloatVecPair*, align 8
	store %class.FloatVecPair* %FVP, %class.FloatVecPair** %FVP.addr, align 8, !tbaa !0			store %class.FloatVecPair* %FVP, %class.FloatVecPair** %FVP.addr, align 8, !tbaa !0
	%0 = load %class.FloatVecPair, %class.FloatVecPair* %FVP.addr, align 8, !tbaa !0			%0 = load %class.FloatVecPair, %class.FloatVecPair* %FVP.addr, align 8, !tbaa !0
	call void @_ZN12FloatVecPair6vecIncEv(%class.FloatVecPair* %0)			call void @_ZN12FloatVecPair6vecIncEv(%class.FloatVecPair* %0)
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll

	Show All 9 Lines

	target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.15.0"			target triple = "x86_64-apple-macosx10.15.0"

	define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {			define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {
	; CHECK-LABEL: @vdiv(			; CHECK-LABEL: @vdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.lr.ph:			; CHECK: for.body.preheader:
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr double, double [[X:%.*]], i64 [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr double, double [[X:%.*]], i64 [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: [[SCEVGEP6:%.]] = getelementptr double, double [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[SCEVGEP6:%.]] = getelementptr double, double [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt double [[SCEVGEP6]], [[X]]			; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt double [[SCEVGEP6]], [[X]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt double [[SCEVGEP]], [[Y]]			; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt double [[SCEVGEP]], [[Y]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x double> poison, double [[A:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x double> poison, double [[A:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP0:%.*]] = add nsw i64 [[N_VEC]], -4			; CHECK-NEXT: [[TMP0:%.*]] = add nsw i64 [[N_VEC]], -4
	; CHECK-NEXT: [[TMP1:%.*]] = lshr exact i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = lshr exact i64 [[TMP0]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[TMP2]], 3			; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[TMP2]], 3
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], 12			; CHECK-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], 12
	; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK_UNR_LCSSA:%.]], label [[VECTOR_PH_NEW:%.]]			; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK_UNR_LCSSA:%.]], label [[VECTOR_PH_NEW:%.]]
	; CHECK: vector.ph.new:			; CHECK: vector.ph.new:
	; CHECK-NEXT: [[UNROLL_ITER:%.*]] = and i64 [[TMP2]], 9223372036854775804			; CHECK-NEXT: [[UNROLL_ITER:%.*]] = and i64 [[TMP2]], 9223372036854775804
	; CHECK-NEXT: [[TMP4:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP4:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP5:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP5:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP6:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP6:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH_NEW]] ], [ [[INDEX_NEXT_3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH_NEW]] ], [ [[INDEX_NEXT_3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP8]] to <4 x double>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[TMP8]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7
	; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
	; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_NEXT]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_NEXT]]
	; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[TMP13]] to <4 x double>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[TMP13]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD_1:%.]] = load <4 x double>, <4 x double> [[TMP14]], align 8, [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD_1:%.]] = load <4 x double>, <4 x double> [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_NEXT]]			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_NEXT]]
	; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP16]] to <4 x double>*			; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP16]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
	; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_NEXT_1]]			; CHECK-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_NEXT_1]]
	; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP18]] to <4 x double>*			; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP18]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD_2:%.]] = load <4 x double>, <4 x double> [[TMP19]], align 8, [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD_2:%.]] = load <4 x double>, <4 x double> [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7
	; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]]			; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]]
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_NEXT_1]]			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_NEXT_1]]
	; CHECK-NEXT: [[TMP22:%.]] = bitcast double [[TMP21]] to <4 x double>*			; CHECK-NEXT: [[TMP22:%.]] = bitcast double [[TMP21]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
	; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12			; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_NEXT_2]]			; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_NEXT_2]]
	; CHECK-NEXT: [[TMP24:%.]] = bitcast double [[TMP23]] to <4 x double>*			; CHECK-NEXT: [[TMP24:%.]] = bitcast double [[TMP23]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD_3:%.]] = load <4 x double>, <4 x double> [[TMP24]], align 8, [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD_3:%.]] = load <4 x double>, <4 x double> [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7
	; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]]			; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]]
	; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_NEXT_2]]			; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_NEXT_2]]
	; CHECK-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <4 x double>*			; CHECK-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
	; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4			; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4
	; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0			; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0
	; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block.unr-lcssa:			; CHECK: middle.block.unr-lcssa:
	; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0			; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD_NOT]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY_EPIL_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[LCMP_MOD_NOT]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY_EPIL_PREHEADER:%.]]
	; CHECK: vector.body.epil.preheader:			; CHECK: vector.body.epil.preheader:
	; CHECK-NEXT: [[TMP28:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP28:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: br label [[VECTOR_BODY_EPIL:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY_EPIL:%.*]]
	; CHECK: vector.body.epil:			; CHECK: vector.body.epil:
	; CHECK-NEXT: [[INDEX_EPIL:%.]] = phi i64 [ [[INDEX_UNR]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[INDEX_NEXT_EPIL:%.]], [[VECTOR_BODY_EPIL]] ]			; CHECK-NEXT: [[INDEX_EPIL:%.]] = phi i64 [ [[INDEX_UNR]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[INDEX_NEXT_EPIL:%.]], [[VECTOR_BODY_EPIL]] ]
	; CHECK-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.]], [[VECTOR_BODY_EPIL]] ]			; CHECK-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.]], [[VECTOR_BODY_EPIL]] ]
	; CHECK-NEXT: [[TMP29:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_EPIL]]			; CHECK-NEXT: [[TMP29:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDEX_EPIL]]
	; CHECK-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*			; CHECK-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.]] = load <4 x double>, <4 x double> [[TMP30]], align 8, [[TBAA3]], !alias.scope !7			; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.]] = load <4 x double>, <4 x double> [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7
	; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]]			; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]]
	; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_EPIL]]			; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDEX_EPIL]]
	; CHECK-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*			; CHECK-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*
	; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7			; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
	; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4			; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4
	; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1			; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1
	; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0			; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0
	; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader8:
	; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1			; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1
	; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3			; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3
	; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0			; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.]], label [[FOR_BODY_PROL_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.]], label [[FOR_BODY_PROL_PREHEADER:%.]]
	; CHECK: for.body.prol.preheader:			; CHECK: for.body.prol.preheader:
	; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]]			; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]]
	; CHECK: for.body.prol:			; CHECK: for.body.prol:
	; CHECK-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ]
	; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_SUB:%.]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ]			; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_SUB:%.]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ]
	; CHECK-NEXT: [[ARRAYIDX_PROL:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_PROL]]			; CHECK-NEXT: [[ARRAYIDX_PROL:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_PROL]]
	; CHECK-NEXT: [[T0_PROL:%.]] = load double, double [[ARRAYIDX_PROL]], align 8, [[TBAA3]]			; CHECK-NEXT: [[T0_PROL:%.]] = load double, double [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]]			; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]]
	; CHECK-NEXT: [[ARRAYIDX2_PROL:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_PROL]]			; CHECK-NEXT: [[ARRAYIDX2_PROL:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_PROL]]
	; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]]			; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1
	; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1			; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1
	; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0			; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0
	; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: for.body.prol.loopexit:			; CHECK: for.body.prol.loopexit:
	; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]			; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]
	; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3			; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3
	; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]]			; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]]
	; CHECK: for.body.preheader.new:			; CHECK: for.body.preheader8.new:
	; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]]			; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[T0:%.]] = load double, double [[ARRAYIDX]], align 8, [[TBAA3]]			; CHECK-NEXT: [[T0:%.]] = load double, double [[ARRAYIDX]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]]			; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]]
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]]			; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[T0_1:%.]] = load double, double [[ARRAYIDX_1]], align 8, [[TBAA3]]			; CHECK-NEXT: [[T0_1:%.]] = load double, double [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]]			; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]]
	; CHECK-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]]			; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_NEXT_1]]			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_NEXT_1]]
	; CHECK-NEXT: [[T0_2:%.]] = load double, double [[ARRAYIDX_2]], align 8, [[TBAA3]]			; CHECK-NEXT: [[T0_2:%.]] = load double, double [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]]			; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]]
	; CHECK-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_NEXT_1]]			; CHECK-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_NEXT_1]]
	; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]]			; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_NEXT_2]]			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds double, double [[Y]], i64 [[INDVARS_IV_NEXT_2]]
	; CHECK-NEXT: [[T0_3:%.]] = load double, double [[ARRAYIDX_3]], align 8, [[TBAA3]]			; CHECK-NEXT: [[T0_3:%.]] = load double, double [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]]			; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]]
	; CHECK-NEXT: [[ARRAYIDX2_3:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_NEXT_2]]			; CHECK-NEXT: [[ARRAYIDX2_3:%.]] = getelementptr inbounds double, double [[X]], i64 [[INDVARS_IV_NEXT_2]]
	; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]]			; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4			; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4
	; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%div = fdiv fast double 1.0, %a			%div = fdiv fast double 1.0, %a
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	Show All 38 Lines

llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; HOIST: return:			; HOIST: return:
	; HOIST-NEXT: ret void			; HOIST-NEXT: ret void
	;			;
	; ROTATED_LATER_OLDPM-LABEL: @_Z4loopi(			; ROTATED_LATER_OLDPM-LABEL: @_Z4loopi(
	; ROTATED_LATER_OLDPM-NEXT: entry:			; ROTATED_LATER_OLDPM-NEXT: entry:
	; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.]] = icmp slt i32 [[WIDTH:%.]], 1			; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.]] = icmp slt i32 [[WIDTH:%.]], 1
	; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[FOR_COND_PREHEADER:%.]]			; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[FOR_COND_PREHEADER:%.]]
	; ROTATED_LATER_OLDPM: for.cond.preheader:			; ROTATED_LATER_OLDPM: for.cond.preheader:
				; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1
				; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_PREHEADER:%.]]
				; ROTATED_LATER_OLDPM: for.body.preheader:
	; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1			; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1
	; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0			; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]]
	; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY:%.]]
	; ROTATED_LATER_OLDPM: for.cond.cleanup:			; ROTATED_LATER_OLDPM: for.cond.cleanup:
	; ROTATED_LATER_OLDPM-NEXT: tail call void @f0()			; ROTATED_LATER_OLDPM-NEXT: tail call void @f0()
	; ROTATED_LATER_OLDPM-NEXT: tail call void @f2()			; ROTATED_LATER_OLDPM-NEXT: tail call void @f2()
	; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]]			; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]]
	; ROTATED_LATER_OLDPM: for.body:			; ROTATED_LATER_OLDPM: for.body:
	; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ]			; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; ROTATED_LATER_OLDPM-NEXT: tail call void @f0()			; ROTATED_LATER_OLDPM-NEXT: tail call void @f0()
	; ROTATED_LATER_OLDPM-NEXT: tail call void @f1()			; ROTATED_LATER_OLDPM-NEXT: tail call void @f1()
	; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1			; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1
	; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]]			; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]]
	; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]			; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
	; ROTATED_LATER_OLDPM: return:			; ROTATED_LATER_OLDPM: return:
	; ROTATED_LATER_OLDPM-NEXT: ret void			; ROTATED_LATER_OLDPM-NEXT: ret void
	;			;
	; ROTATED_LATER_NEWPM-LABEL: @_Z4loopi(			; ROTATED_LATER_NEWPM-LABEL: @_Z4loopi(
	; ROTATED_LATER_NEWPM-NEXT: entry:			; ROTATED_LATER_NEWPM-NEXT: entry:
	; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.]] = icmp slt i32 [[WIDTH:%.]], 1			; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.]] = icmp slt i32 [[WIDTH:%.]], 1
	; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[FOR_COND_PREHEADER:%.]]			; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[FOR_COND_PREHEADER:%.]]
	; ROTATED_LATER_NEWPM: for.cond.preheader:			; ROTATED_LATER_NEWPM: for.cond.preheader:
				; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1
				; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_PREHEADER:%.]]
				; ROTATED_LATER_NEWPM: for.body.preheader:
	; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1			; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1
	; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0			; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1
	; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.]]
	; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge:
	; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1
	; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]]			; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]]
	; ROTATED_LATER_NEWPM: for.cond.cleanup:			; ROTATED_LATER_NEWPM: for.cond.cleanup:
	; ROTATED_LATER_NEWPM-NEXT: tail call void @f0()			; ROTATED_LATER_NEWPM-NEXT: tail call void @f0()
	; ROTATED_LATER_NEWPM-NEXT: tail call void @f2()			; ROTATED_LATER_NEWPM-NEXT: tail call void @f2()
	; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]]			; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]]
	; ROTATED_LATER_NEWPM: for.body:			; ROTATED_LATER_NEWPM: for.body:
	; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.]] = phi i32 [ [[INC_0:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ]			; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.]] = phi i32 [ [[INC_0:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ]
	; ROTATED_LATER_NEWPM-NEXT: tail call void @f0()			; ROTATED_LATER_NEWPM-NEXT: tail call void @f0()
	; ROTATED_LATER_NEWPM-NEXT: tail call void @f1()			; ROTATED_LATER_NEWPM-NEXT: tail call void @f1()
	; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]]			; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]]
	; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]			; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]
	; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge:			; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge:
	; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1			; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1
	; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]]			; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]]
	; ROTATED_LATER_NEWPM: return:			; ROTATED_LATER_NEWPM: return:
	; ROTATED_LATER_NEWPM-NEXT: ret void			; ROTATED_LATER_NEWPM-NEXT: ret void
	;			;
	; ROTATE_OLDPM-LABEL: @_Z4loopi(			; ROTATE_OLDPM-LABEL: @_Z4loopi(
	; ROTATE_OLDPM-NEXT: entry:			; ROTATE_OLDPM-NEXT: entry:
	; ROTATE_OLDPM-NEXT: [[CMP:%.]] = icmp slt i32 [[WIDTH:%.]], 1			; ROTATE_OLDPM-NEXT: [[CMP:%.]] = icmp slt i32 [[WIDTH:%.]], 1
	; ROTATE_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[FOR_COND_PREHEADER:%.]]			; ROTATE_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[FOR_COND_PREHEADER:%.]]
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PassManager] Run additional LICM before LoopRotateClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 334916

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Other/pass-pipelines.ll

llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll

llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll

llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll

llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll

[PassManager] Run additional LICM before LoopRotate
ClosedPublic