This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
cleanup-destslot-simple.c
-
llvm/
-
lib/Passes/
-
Passes/
3/3
PassBuilderPipelines.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
Transforms/
-
Coroutines/
-
coro-retcon-resume-values.ll
-
PhaseOrdering/
-
X86/
-
SROA-after-final-loop-unrolling-2.ll
-
SROA-after-final-loop-unrolling.ll
-
single-iteration-loop-sroa.ll

Differential D136806

[Pipelines] Introduce SROA after (final, full) loop unrolling
ClosedPublic

Authored by lebedev.ri on Oct 26 2022, 5:53 PM.

Download Raw Diff

Details

Reviewers

aeubanks
fhahn
nikic
spatel
MaskRay
asbirlea
reames
arsenm

Commits

rG8adfa29706e5: [Pipelines] Introduce SROA after (final, run-time) loop unrolling

Summary

I am surprised that we didn't do this already.

While it would be good to not introduce one more SROA invocation,
but instead move the one from PassBuilder::buildFunctionSimplificationPipeline(),
the existing test coverage says that is a bad idea,
though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions

So instead, i add yet another SROA run.
I have checked, and it needs to be at least after said full loop unrolling,
but i suppose placing it after InstCombine does not hurt.
Surprisingly, this is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions

Now, something in that link makes me cringe.
Compare it with https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=d534377c0324a63ac26b990967577562e28a148f&stat=instructions
Those two commits share base, and don't functionally differ,
yet that +0.23% 'regression' is there in one of them.

I've encountered this in a real code, SROA-after-final-loop-unrolling.ll has been reduced from https://godbolt.org/z/fsdMhETh3

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Oct 26 2022, 5:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 5:53 PM

Herald added subscribers: ormris, StephenFan, wenlei and 3 others. · View Herald Transcript

lebedev.ri requested review of this revision.Oct 26 2022, 5:53 PM

Harbormaster completed remote builds in B194533: Diff 470979.Oct 26 2022, 7:03 PM

Some context here: LLVM performs unrolling in two places. There is a full unroll as part of the function simplification pipeline (i.e. interleaved with inlining) and a runtime unroll at the end of the module simplification pipeline. The general expectation is that if a loop is going to be fully unrolled, then this should happen during the full unroll pass. We run SROA after full unroll, as well as a significant further optimization pipeline, and do this pre-inlining, so the inlining cost model is correct. Conversely, the final runtime unroll is not supposed to expose significant further optimization opportunities -- it is a late pre-codegen pass.

Of course, runtime unrolling can end up performing full unrolling, for example if the trip count could not yet be determined at the time of full unrolling, but can be determined at the time of runtime unrolling, and this patch can help in such situations. I have encountered cases like this quite a few times with Rust iterator adaptors, but my conclusion from analyzing them was generally that the proper way to address this is to make the trip count computable at full unroll, because this integrates properly with the remaining pipeline (and is the reason why we have that separate full unroll pass in the first place). For example, I've had cases where the code after unrolling reduced to the moral equivalent of "a + b", which is something we want to happen before inlining, not after. The patch at https://reviews.llvm.org/D133192 is motivated by one such case (it allows LICM to do more scalar promotion, which allows SCEV to compute the trip count before full unroll, which ends up reducing the loop to something trivial).

As such, I'm not convinced that this is the right way to go about it. I would suggest to at least analyze the pre-full-unroll IR in your motivating cases and check whether there is anything obvious that can be done to already enable unrolling at that stage, rather than in the late pipeline.

Those two commits share base, and don't functionally differ, yet that +0.23% 'regression' is there in one of them.

Yes, the instructions metric is more noisy than I would like nowadays. The consumer-typeset, ClamAV and SPASS benchmarks in particular have this kind of bimodal noise, for whatever reason. I should probably switch the default metric to instructions:u, which doesn't have this problem.

In D136806#3887547, @nikic wrote:

Some context here: LLVM performs unrolling in two places. There is a full unroll as part of the function simplification pipeline (i.e. interleaved with inlining) and a runtime unroll at the end of the module simplification pipeline. The general expectation is that if a loop is going to be fully unrolled, then this should happen during the full unroll pass. We run SROA after full unroll, as well as a significant further optimization pipeline, and do this pre-inlining, so the inlining cost model is correct. Conversely, the final runtime unroll is not supposed to expose significant further optimization opportunities -- it is a late pre-codegen pass.

Of course, runtime unrolling can end up performing full unrolling, for example if the trip count could not yet be determined at the time of full unrolling, but can be determined at the time of runtime unrolling, and this patch can help in such situations. I have encountered cases like this quite a few times with Rust iterator adaptors, but my conclusion from analyzing them was generally that the proper way to address this is to make the trip count computable at full unroll, because this integrates properly with the remaining pipeline (and is the reason why we have that separate full unroll pass in the first place). For example, I've had cases where the code after unrolling reduced to the moral equivalent of "a + b", which is something we want to happen before inlining, not after. The patch at https://reviews.llvm.org/D133192 is motivated by one such case (it allows LICM to do more scalar promotion, which allows SCEV to compute the trip count before full unroll, which ends up reducing the loop to something trivial).

As such, I'm not convinced that this is the right way to go about it. I would suggest to at least analyze the pre-full-unroll IR in your motivating cases and check whether there is anything obvious that can be done to already enable unrolling at that stage, rather than in the late pipeline.

I did. This is not a SCEV/LoopUnroll limitation, but the usual genius of our arbitrarily random cut-offs.
The alternative is to bump "unroll-max-iteration-count-to-analyze" to at least 17 and "unroll-threshold-aggressive" to at least 337.
https://godbolt.org/z/PMe6qhaKs
I can't imagine this won't have worse implications for the compile time. Is that preferred?

In D136806#3888473, @lebedev.ri wrote:

In D136806#3887547, @nikic wrote:

Some context here: LLVM performs unrolling in two places. There is a full unroll as part of the function simplification pipeline (i.e. interleaved with inlining) and a runtime unroll at the end of the module simplification pipeline. The general expectation is that if a loop is going to be fully unrolled, then this should happen during the full unroll pass. We run SROA after full unroll, as well as a significant further optimization pipeline, and do this pre-inlining, so the inlining cost model is correct. Conversely, the final runtime unroll is not supposed to expose significant further optimization opportunities -- it is a late pre-codegen pass.

Of course, runtime unrolling can end up performing full unrolling, for example if the trip count could not yet be determined at the time of full unrolling, but can be determined at the time of runtime unrolling, and this patch can help in such situations. I have encountered cases like this quite a few times with Rust iterator adaptors, but my conclusion from analyzing them was generally that the proper way to address this is to make the trip count computable at full unroll, because this integrates properly with the remaining pipeline (and is the reason why we have that separate full unroll pass in the first place). For example, I've had cases where the code after unrolling reduced to the moral equivalent of "a + b", which is something we want to happen before inlining, not after. The patch at https://reviews.llvm.org/D133192 is motivated by one such case (it allows LICM to do more scalar promotion, which allows SCEV to compute the trip count before full unroll, which ends up reducing the loop to something trivial).

As such, I'm not convinced that this is the right way to go about it. I would suggest to at least analyze the pre-full-unroll IR in your motivating cases and check whether there is anything obvious that can be done to already enable unrolling at that stage, rather than in the late pipeline.

I did. This is not a SCEV/LoopUnroll limitation, but the usual genius of our arbitrarily random cut-offs.
The alternative is to bump "unroll-max-iteration-count-to-analyze" to at least 17 and "unroll-threshold-aggressive" to at least 337.
https://godbolt.org/z/PMe6qhaKs
I can't imagine this won't have worse implications for the compile time. Is that preferred?

Also, why "unroll-max-iteration-count-to-analyze" is iteration-based?
After a brief look, i'm not seeing a complexity/instruction count cut-off (other than MaxUnrolledLoopSize)
which means it would discard 11-iteration 2-instruction loop, but accept 10-iteration 1000-instruction loop?

lebedev.ri mentioned this in rGe14f30584de2: [NFC][PhaseOrdering] Add one more test for SROA after partial unroll.Oct 27 2022, 9:10 AM

In D136806#3888499, @lebedev.ri wrote:

In D136806#3888473, @lebedev.ri wrote:

In D136806#3887547, @nikic wrote:

Some context here: LLVM performs unrolling in two places. There is a full unroll as part of the function simplification pipeline (i.e. interleaved with inlining) and a runtime unroll at the end of the module simplification pipeline. The general expectation is that if a loop is going to be fully unrolled, then this should happen during the full unroll pass. We run SROA after full unroll, as well as a significant further optimization pipeline, and do this pre-inlining, so the inlining cost model is correct. Conversely, the final runtime unroll is not supposed to expose significant further optimization opportunities -- it is a late pre-codegen pass.

Of course, runtime unrolling can end up performing full unrolling, for example if the trip count could not yet be determined at the time of full unrolling, but can be determined at the time of runtime unrolling, and this patch can help in such situations. I have encountered cases like this quite a few times with Rust iterator adaptors, but my conclusion from analyzing them was generally that the proper way to address this is to make the trip count computable at full unroll, because this integrates properly with the remaining pipeline (and is the reason why we have that separate full unroll pass in the first place). For example, I've had cases where the code after unrolling reduced to the moral equivalent of "a + b", which is something we want to happen before inlining, not after. The patch at https://reviews.llvm.org/D133192 is motivated by one such case (it allows LICM to do more scalar promotion, which allows SCEV to compute the trip count before full unroll, which ends up reducing the loop to something trivial).

As such, I'm not convinced that this is the right way to go about it. I would suggest to at least analyze the pre-full-unroll IR in your motivating cases and check whether there is anything obvious that can be done to already enable unrolling at that stage, rather than in the late pipeline.

Also, this reasoning is not correct, full unrolling does not perform partial unrolling,
and yet partial unrolling absolutely can obviously expose further SROA opportunities,
see SROA-after-final-loop-unrolling-2.ll.

I did. This is not a SCEV/LoopUnroll limitation, but the usual genius of our arbitrarily random cut-offs.
The alternative is to bump "unroll-max-iteration-count-to-analyze" to at least 17 and "unroll-threshold-aggressive" to at least 337.
https://godbolt.org/z/PMe6qhaKs
I can't imagine this won't have worse implications for the compile time. Is that preferred?

Also, why "unroll-max-iteration-count-to-analyze" is iteration-based?
After a brief look, i'm not seeing a complexity/instruction count cut-off (other than MaxUnrolledLoopSize)
which means it would discard 11-iteration 2-instruction loop, but accept 10-iteration 1000-instruction loop?

Harbormaster completed remote builds in B194675: Diff 471185.Oct 27 2022, 10:48 AM

Some stats:

-unroll-max-iteration-count-to-analyze=16: https://llvm-compile-time-tracker.com/compare.php?from=adf3daae1c10b46de4017d7ab0110c215c1b9981&to=c841575647945a1cdb726588849b9506825e1202&stat=instructions:u
-unroll-max-iteration-count-to-analyze=32: https://llvm-compile-time-tracker.com/compare.php?from=adf3daae1c10b46de4017d7ab0110c215c1b9981&to=d94a21d011e2591b773ac65e26817871d3639db5&stat=instructions:u
-unroll-max-iteration-count-to-analyze=64: https://llvm-compile-time-tracker.com/compare.php?from=adf3daae1c10b46de4017d7ab0110c215c1b9981&to=1a69385c5390e74ee6db233bb2452531d0660de8&stat=instructions:u
-unroll-max-iteration-count-to-analyze=128: https://llvm-compile-time-tracker.com/compare.php?from=adf3daae1c10b46de4017d7ab0110c215c1b9981&to=e3e6643242fbc30f05aedfd3613cbdcac88a017e&stat=instructions:u
-unroll-max-iteration-count-to-analyze=1M: https://llvm-compile-time-tracker.com/compare.php?from=adf3daae1c10b46de4017d7ab0110c215c1b9981&to=ac4c5f401a5d87db772f5e0183bb98a458b7704d&stat=instructions:u

As one would expect, bumping it has a pretty catastrophical impact both on the compile time, and code size.
Is that really the preferred(*) solution?

(At the same time, i still stand by my opinion that once LV properly handles outer loop vectorization, we must not unroll before that.)

ping

Post-meeting ping.
It would seem nobody has strong feelings on this.
Would anyone like to block this based on the compile time impact?

I think SROA after unroll is important. It's practically the main reason to unroll

In D136806#3926319, @arsenm wrote:

I think SROA after unroll is important. It's practically the main reason to unroll

Yay, finally some recognition.
Would you like to stamp? :)

Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 4:53 PM

Herald added subscribers: cfe-commits, wdng. · View Herald Transcript

In D136806#3926319, @arsenm wrote:

I think SROA after unroll is important. It's practically the main reason to unroll

llvm/lib/Passes/PassBuilderPipelines.cpp
1201–1202	As the cross-revision diff shows, running sroa before instcombine, non-unexpectedly, makes more sense.

Harbormaster completed remote builds in B197650: Diff 475311.Nov 14 2022, 5:28 PM

IIUC compile time impact for adding another SROA (the one outside LTO) is negligible?

Regarding the principle of adding another pass and where in the pipeline, we're still at a case by case basis. We had a discussion/round table at LLVM Dev on the documenting the current new pass manager pipeline precisely for understanding pass dependencies and being able to get resolutions on cases such as this. I'm still putting together the summary for that but it should be up on discourse this week.

Testing this change in parallel - I'll follow up tomorrow.

In D136806#3928663, @asbirlea wrote:

IIUC compile time impact for adding another SROA (the one outside LTO) is negligible?

Yup.

Regarding the principle of adding another pass and where in the pipeline, we're still at a case by case basis. We had a discussion/round table at LLVM Dev on the documenting the current new pass manager pipeline precisely for understanding pass dependencies and being able to get resolutions on cases such as this. I'm still putting together the summary for that but it should be up on discourse this week.

We really can't achieve the same effect here in any other reasonable way,
without penalizing the inliner pipeline.

Testing this change in parallel - I'll follow up tomorrow.

Thanks.

LGTM based on the timing, results, and alternatives discussed
There's some testing in progress according to previous comments, so best to wait for that to finish in case it turns up anything new.

llvm/lib/Passes/PassBuilderPipelines.cpp
1123	Is there a reason to put this down here vs. tacking it on the end of the previous IsFullLTO block? If LoopUnroll is the reason for adding SROA, then mention that specifically in the comment? IIRC, all of the FullLTO predicates in this set of passes were questionable (see TODO comment above this function). They just accumulated because the code was duplicated and diverged over time.

This revision is now accepted and ready to land.Nov 16 2022, 1:55 PM

In D136806#3931860, @spatel wrote:

LGTM based on the timing, results, and alternatives discussed

Thank you for the review.

There's some testing in progress according to previous comments, so best to wait for that to finish in case it turns up anything new.

llvm/lib/Passes/PassBuilderPipelines.cpp
1123	Is there a reason to put this down here vs. tacking it on the end of the previous IsFullLTO block? I don't have any particular reason for this. Will change. If LoopUnroll is the reason for adding SROA, then mention that specifically in the comment? Will do. IIRC, all of the FullLTO predicates in this set of passes were questionable (see TODO comment above this function). They just accumulated because the code was duplicated and diverged over time. Yeah, i remember all that. This is indeed ugly.

Adjust full LTO pipeline, for symmetry with non-full LTO pipeline.
Looks like there is test coverage shortage.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptNov 16 2022, 2:52 PM

Harbormaster completed remote builds in B198071: Diff 475925.Nov 16 2022, 4:07 PM

As far as performance, this looks fine. Not seeing measurable gains.

In D136806#3932369, @asbirlea wrote:

As far as performance, this looks fine. Not seeing measurable gains.

Thank you for checking!
Is the evaluation still ongoing, or is this the green light to go ahead?

Green light perf-wise.
I cannot comment on whether the position is "the right" one though. I'm deferring to the other reviewers.

In D136806#3934445, @asbirlea wrote:

Green light perf-wise.
I cannot comment on whether the position is "the right" one though. I'm deferring to the other reviewers.

Thank you!

This revision was landed with ongoing or failed builds.Nov 17 2022, 10:35 AM

Closed by commit rG8adfa29706e5: [Pipelines] Introduce SROA after (final, run-time) loop unrolling (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG8adfa29706e5: [Pipelines] Introduce SROA after (final, run-time) loop unrolling.

Hi @lebedev.ri, we recently noticed a failure in one of our internal tests which I bisected back to your change. I have filed the details in Issue #59122. Can you take a look?

lebedev.ri mentioned this in D138238: [SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node.Nov 30 2022, 6:17 PM

lebedev.ri mentioned this in rG03e6d9d9d1d4: [SROA] For non-speculatable `load`s of `select`s -- split block, insert….Dec 8 2022, 5:52 AM

lebedev.ri mentioned this in rG4f7e5d22060e: [SROA] For non-speculatable `load`s of `select`s -- split block, insert….Dec 8 2022, 9:20 AM

lebedev.ri mentioned this in D115261: [LV] Disable runtime unrolling for vectorized loops..Dec 16 2022, 6:26 AM

Revision Contents

Path

Size

clang/

test/

CodeGen/

cleanup-destslot-simple.c

16 lines

llvm/

lib/

Passes/

PassBuilderPipelines.cpp

8 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-lto-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

new-pm-thinlto-postlink-pgo-defaults.ll

1 line

new-pm-thinlto-postlink-samplepgo-defaults.ll

1 line

Transforms/

Coroutines/

coro-retcon-resume-values.ll

47 lines

PhaseOrdering/

X86/

SROA-after-final-loop-unrolling-2.ll

34 lines

SROA-after-final-loop-unrolling.ll

34 lines

single-iteration-loop-sroa.ll

10 lines

Diff 476176

clang/test/CodeGen/cleanup-destslot-simple.c

	Show All 11 Lines
	// CHECK-LIFETIME-LABEL: @test(			// CHECK-LIFETIME-LABEL: @test(
	// CHECK-LIFETIME-NEXT: entry:			// CHECK-LIFETIME-NEXT: entry:
	// CHECK-LIFETIME-NEXT: [[X:%.*]] = alloca i32, align 4			// CHECK-LIFETIME-NEXT: [[X:%.*]] = alloca i32, align 4
	// CHECK-LIFETIME-NEXT: [[P:%.*]] = alloca ptr, align 8			// CHECK-LIFETIME-NEXT: [[P:%.*]] = alloca ptr, align 8
	// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2:[0-9]+]], !dbg [[DBG9:![0-9]+]]			// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2:[0-9]+]], !dbg [[DBG9:![0-9]+]]
	// CHECK-LIFETIME-NEXT: store i32 3, ptr [[X]], align 4, !dbg [[DBG10:![0-9]+]], !tbaa [[TBAA11:![0-9]+]]			// CHECK-LIFETIME-NEXT: store i32 3, ptr [[X]], align 4, !dbg [[DBG10:![0-9]+]], !tbaa [[TBAA11:![0-9]+]]
	// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG15:![0-9]+]]			// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG15:![0-9]+]]
	// CHECK-LIFETIME-NEXT: store volatile ptr [[X]], ptr [[P]], align 8, !dbg [[DBG16:![0-9]+]], !tbaa [[TBAA17:![0-9]+]]			// CHECK-LIFETIME-NEXT: store volatile ptr [[X]], ptr [[P]], align 8, !dbg [[DBG16:![0-9]+]], !tbaa [[TBAA17:![0-9]+]]
	// CHECK-LIFETIME-NEXT: [[P_0_P_0_P_0_:%.*]] = load volatile ptr, ptr [[P]], align 8, !dbg [[DBG19:![0-9]+]], !tbaa [[TBAA17]]			// CHECK-LIFETIME-NEXT: [[P_0_P_0_P_0_P_0_:%.*]] = load volatile ptr, ptr [[P]], align 8, !dbg [[DBG19:![0-9]+]], !tbaa [[TBAA17]]
	// CHECK-LIFETIME-NEXT: [[TMP0:%.*]] = load i32, ptr [[P_0_P_0_P_0_]], align 4, !dbg [[DBG20:![0-9]+]], !tbaa [[TBAA11]]			// CHECK-LIFETIME-NEXT: [[TMP0:%.*]] = load i32, ptr [[P_0_P_0_P_0_P_0_]], align 4, !dbg [[DBG20:![0-9]+]], !tbaa [[TBAA11]]
	// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG21:![0-9]+]]			// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG21:![0-9]+]]
	// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2]], !dbg [[DBG21]]			// CHECK-LIFETIME-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2]], !dbg [[DBG21]]
	// CHECK-LIFETIME-NEXT: ret i32 [[TMP0]], !dbg [[DBG22:![0-9]+]]			// CHECK-LIFETIME-NEXT: ret i32 [[TMP0]], !dbg [[DBG22:![0-9]+]]
	//			//
	// CHECK-OPTNONE-LABEL: @test(			// CHECK-OPTNONE-LABEL: @test(
	// CHECK-OPTNONE-NEXT: entry:			// CHECK-OPTNONE-NEXT: entry:
	// CHECK-OPTNONE-NEXT: [[X:%.*]] = alloca i32, align 4			// CHECK-OPTNONE-NEXT: [[X:%.*]] = alloca i32, align 4
	// CHECK-OPTNONE-NEXT: [[P:%.*]] = alloca ptr, align 8			// CHECK-OPTNONE-NEXT: [[P:%.*]] = alloca ptr, align 8
	Show All 14 Lines
	// CHECK-MSAN-NEXT: store i32 0, ptr [[TMP2]], align 4, !dbg [[DBG10:![0-9]+]]			// CHECK-MSAN-NEXT: store i32 0, ptr [[TMP2]], align 4, !dbg [[DBG10:![0-9]+]]
	// CHECK-MSAN-NEXT: store i32 3, ptr [[X]], align 4, !dbg [[DBG10]], !tbaa [[TBAA11:![0-9]+]]			// CHECK-MSAN-NEXT: store i32 3, ptr [[X]], align 4, !dbg [[DBG10]], !tbaa [[TBAA11:![0-9]+]]
	// CHECK-MSAN-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG15:![0-9]+]]			// CHECK-MSAN-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG15:![0-9]+]]
	// CHECK-MSAN-NEXT: [[TMP3:%.*]] = ptrtoint ptr [[P]] to i64, !dbg [[DBG15]]			// CHECK-MSAN-NEXT: [[TMP3:%.*]] = ptrtoint ptr [[P]] to i64, !dbg [[DBG15]]
	// CHECK-MSAN-NEXT: [[TMP4:%.*]] = xor i64 [[TMP3]], 87960930222080, !dbg [[DBG15]]			// CHECK-MSAN-NEXT: [[TMP4:%.*]] = xor i64 [[TMP3]], 87960930222080, !dbg [[DBG15]]
	// CHECK-MSAN-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to ptr, !dbg [[DBG15]]			// CHECK-MSAN-NEXT: [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to ptr, !dbg [[DBG15]]
	// CHECK-MSAN-NEXT: store i64 0, ptr [[TMP5]], align 8, !dbg [[DBG16:![0-9]+]]			// CHECK-MSAN-NEXT: store i64 0, ptr [[TMP5]], align 8, !dbg [[DBG16:![0-9]+]]
	// CHECK-MSAN-NEXT: store volatile ptr [[X]], ptr [[P]], align 8, !dbg [[DBG16]], !tbaa [[TBAA17:![0-9]+]]			// CHECK-MSAN-NEXT: store volatile ptr [[X]], ptr [[P]], align 8, !dbg [[DBG16]], !tbaa [[TBAA17:![0-9]+]]
	// CHECK-MSAN-NEXT: [[P_0_P_0_P_0_:%.*]] = load volatile ptr, ptr [[P]], align 8, !dbg [[DBG19:![0-9]+]], !tbaa [[TBAA17]]			// CHECK-MSAN-NEXT: [[P_0_P_0_P_0_P_0_:%.*]] = load volatile ptr, ptr [[P]], align 8, !dbg [[DBG19:![0-9]+]], !tbaa [[TBAA17]]
	// CHECK-MSAN-NEXT: [[_MSLD:%.*]] = load i64, ptr [[TMP5]], align 8, !dbg [[DBG19]]			// CHECK-MSAN-NEXT: [[_MSLD:%.*]] = load i64, ptr [[TMP5]], align 8, !dbg [[DBG19]]
	// CHECK-MSAN-NEXT: [[_MSCMP_NOT:%.*]] = icmp eq i64 [[_MSLD]], 0, !dbg [[DBG20:![0-9]+]]			// CHECK-MSAN-NEXT: [[_MSCMP_NOT:%.*]] = icmp eq i64 [[_MSLD]], 0, !dbg [[DBG20:![0-9]+]]
	// CHECK-MSAN-NEXT: br i1 [[_MSCMP_NOT]], label [[TMP7:%.]], label [[TMP6:%.]], !dbg [[DBG20]], !prof [[PROF21:![0-9]+]]			// CHECK-MSAN-NEXT: br i1 [[_MSCMP_NOT]], label [[TMP7:%.]], label [[TMP6:%.]], !dbg [[DBG20]], !prof [[PROF21:![0-9]+]]
	// CHECK-MSAN: 6:			// CHECK-MSAN: 6:
	// CHECK-MSAN-NEXT: call void @__msan_warning_noreturn() #[[ATTR3:[0-9]+]], !dbg [[DBG20]]			// CHECK-MSAN-NEXT: call void @__msan_warning_noreturn() #[[ATTR3:[0-9]+]], !dbg [[DBG20]]
	// CHECK-MSAN-NEXT: unreachable, !dbg [[DBG20]]			// CHECK-MSAN-NEXT: unreachable, !dbg [[DBG20]]
	// CHECK-MSAN: 7:			// CHECK-MSAN: 7:
	// CHECK-MSAN-NEXT: [[TMP8:%.*]] = load i32, ptr [[P_0_P_0_P_0_]], align 4, !dbg [[DBG20]], !tbaa [[TBAA11]]			// CHECK-MSAN-NEXT: [[TMP8:%.*]] = load i32, ptr [[P_0_P_0_P_0_P_0_]], align 4, !dbg [[DBG20]], !tbaa [[TBAA11]]
	// CHECK-MSAN-NEXT: [[TMP9:%.*]] = ptrtoint ptr [[P_0_P_0_P_0_]] to i64, !dbg [[DBG20]]			// CHECK-MSAN-NEXT: [[TMP9:%.*]] = ptrtoint ptr [[P_0_P_0_P_0_P_0_]] to i64, !dbg [[DBG20]]
	// CHECK-MSAN-NEXT: [[TMP10:%.*]] = xor i64 [[TMP9]], 87960930222080, !dbg [[DBG20]]			// CHECK-MSAN-NEXT: [[TMP10:%.*]] = xor i64 [[TMP9]], 87960930222080, !dbg [[DBG20]]
	// CHECK-MSAN-NEXT: [[TMP11:%.*]] = inttoptr i64 [[TMP10]] to ptr, !dbg [[DBG20]]			// CHECK-MSAN-NEXT: [[TMP11:%.*]] = inttoptr i64 [[TMP10]] to ptr, !dbg [[DBG20]]
	// CHECK-MSAN-NEXT: [[_MSLD1:%.*]] = load i32, ptr [[TMP11]], align 4, !dbg [[DBG20]]			// CHECK-MSAN-NEXT: [[_MSLD1:%.*]] = load i32, ptr [[TMP11]], align 4, !dbg [[DBG20]]
	// CHECK-MSAN-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG22:![0-9]+]]			// CHECK-MSAN-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG22:![0-9]+]]
	// CHECK-MSAN-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2]], !dbg [[DBG22]]			// CHECK-MSAN-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2]], !dbg [[DBG22]]
	// CHECK-MSAN-NEXT: store i32 [[_MSLD1]], ptr @__msan_retval_tls, align 8, !dbg [[DBG23:![0-9]+]]			// CHECK-MSAN-NEXT: store i32 [[_MSLD1]], ptr @__msan_retval_tls, align 8, !dbg [[DBG23:![0-9]+]]
	// CHECK-MSAN-NEXT: ret i32 [[TMP8]], !dbg [[DBG23]]			// CHECK-MSAN-NEXT: ret i32 [[TMP8]], !dbg [[DBG23]]
	//			//
	Show All 9 Lines
	// CHECK-KMSAN-NEXT: store i32 0, ptr [[TMP2]], align 4, !dbg [[DBG10]]			// CHECK-KMSAN-NEXT: store i32 0, ptr [[TMP2]], align 4, !dbg [[DBG10]]
	// CHECK-KMSAN-NEXT: store i32 3, ptr [[X]], align 4, !dbg [[DBG10]], !tbaa [[TBAA11:![0-9]+]]			// CHECK-KMSAN-NEXT: store i32 3, ptr [[X]], align 4, !dbg [[DBG10]], !tbaa [[TBAA11:![0-9]+]]
	// CHECK-KMSAN-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG15:![0-9]+]]			// CHECK-KMSAN-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG15:![0-9]+]]
	// CHECK-KMSAN-NEXT: call void @__msan_poison_alloca(ptr nonnull [[P]], i64 8, ptr nonnull @[[GLOB1:[0-9]+]]) #[[ATTR2]], !dbg [[DBG15]]			// CHECK-KMSAN-NEXT: call void @__msan_poison_alloca(ptr nonnull [[P]], i64 8, ptr nonnull @[[GLOB1:[0-9]+]]) #[[ATTR2]], !dbg [[DBG15]]
	// CHECK-KMSAN-NEXT: [[TMP3:%.*]] = call { ptr, ptr } @__msan_metadata_ptr_for_store_8(ptr nonnull [[P]]) #[[ATTR2]], !dbg [[DBG16:![0-9]+]]			// CHECK-KMSAN-NEXT: [[TMP3:%.*]] = call { ptr, ptr } @__msan_metadata_ptr_for_store_8(ptr nonnull [[P]]) #[[ATTR2]], !dbg [[DBG16:![0-9]+]]
	// CHECK-KMSAN-NEXT: [[TMP4:%.*]] = extractvalue { ptr, ptr } [[TMP3]], 0, !dbg [[DBG16]]			// CHECK-KMSAN-NEXT: [[TMP4:%.*]] = extractvalue { ptr, ptr } [[TMP3]], 0, !dbg [[DBG16]]
	// CHECK-KMSAN-NEXT: store i64 0, ptr [[TMP4]], align 8, !dbg [[DBG16]]			// CHECK-KMSAN-NEXT: store i64 0, ptr [[TMP4]], align 8, !dbg [[DBG16]]
	// CHECK-KMSAN-NEXT: store volatile ptr [[X]], ptr [[P]], align 8, !dbg [[DBG16]], !tbaa [[TBAA17:![0-9]+]]			// CHECK-KMSAN-NEXT: store volatile ptr [[X]], ptr [[P]], align 8, !dbg [[DBG16]], !tbaa [[TBAA17:![0-9]+]]
	// CHECK-KMSAN-NEXT: [[P_0_P_0_P_0_:%.*]] = load volatile ptr, ptr [[P]], align 8, !dbg [[DBG19:![0-9]+]], !tbaa [[TBAA17]]			// CHECK-KMSAN-NEXT: [[P_0_P_0_P_0_P_0_:%.*]] = load volatile ptr, ptr [[P]], align 8, !dbg [[DBG19:![0-9]+]], !tbaa [[TBAA17]]
	// CHECK-KMSAN-NEXT: [[TMP5:%.*]] = call { ptr, ptr } @__msan_metadata_ptr_for_load_8(ptr nonnull [[P]]) #[[ATTR2]], !dbg [[DBG19]]			// CHECK-KMSAN-NEXT: [[TMP5:%.*]] = call { ptr, ptr } @__msan_metadata_ptr_for_load_8(ptr nonnull [[P]]) #[[ATTR2]], !dbg [[DBG19]]
	// CHECK-KMSAN-NEXT: [[TMP6:%.*]] = extractvalue { ptr, ptr } [[TMP5]], 0, !dbg [[DBG19]]			// CHECK-KMSAN-NEXT: [[TMP6:%.*]] = extractvalue { ptr, ptr } [[TMP5]], 0, !dbg [[DBG19]]
	// CHECK-KMSAN-NEXT: [[_MSLD:%.*]] = load i64, ptr [[TMP6]], align 8, !dbg [[DBG19]]			// CHECK-KMSAN-NEXT: [[_MSLD:%.*]] = load i64, ptr [[TMP6]], align 8, !dbg [[DBG19]]
	// CHECK-KMSAN-NEXT: [[_MSCMP_NOT:%.*]] = icmp eq i64 [[_MSLD]], 0, !dbg [[DBG20:![0-9]+]]			// CHECK-KMSAN-NEXT: [[_MSCMP_NOT:%.*]] = icmp eq i64 [[_MSLD]], 0, !dbg [[DBG20:![0-9]+]]
	// CHECK-KMSAN-NEXT: br i1 [[_MSCMP_NOT]], label [[TMP10:%.]], label [[TMP7:%.]], !dbg [[DBG20]], !prof [[PROF21:![0-9]+]]			// CHECK-KMSAN-NEXT: br i1 [[_MSCMP_NOT]], label [[TMP10:%.]], label [[TMP7:%.]], !dbg [[DBG20]], !prof [[PROF21:![0-9]+]]
	// CHECK-KMSAN: 7:			// CHECK-KMSAN: 7:
	// CHECK-KMSAN-NEXT: [[TMP8:%.*]] = extractvalue { ptr, ptr } [[TMP5]], 1, !dbg [[DBG19]]			// CHECK-KMSAN-NEXT: [[TMP8:%.*]] = extractvalue { ptr, ptr } [[TMP5]], 1, !dbg [[DBG19]]
	// CHECK-KMSAN-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP8]], align 8, !dbg [[DBG19]]			// CHECK-KMSAN-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP8]], align 8, !dbg [[DBG19]]
	// CHECK-KMSAN-NEXT: call void @__msan_warning(i32 [[TMP9]]) #[[ATTR3:[0-9]+]], !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: call void @__msan_warning(i32 [[TMP9]]) #[[ATTR3:[0-9]+]], !dbg [[DBG20]]
	// CHECK-KMSAN-NEXT: br label [[TMP10]], !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: br label [[TMP10]], !dbg [[DBG20]]
	// CHECK-KMSAN: 10:			// CHECK-KMSAN: 10:
	// CHECK-KMSAN-NEXT: [[RETVAL_ORIGIN:%.*]] = getelementptr { [100 x i64], [100 x i64], [100 x i64], [100 x i64], i64, [200 x i32], i32, i32 }, ptr [[TMP0]], i64 0, i32 6			// CHECK-KMSAN-NEXT: [[RETVAL_ORIGIN:%.*]] = getelementptr { [100 x i64], [100 x i64], [100 x i64], [100 x i64], i64, [200 x i32], i32, i32 }, ptr [[TMP0]], i64 0, i32 6
	// CHECK-KMSAN-NEXT: [[RETVAL_SHADOW:%.*]] = getelementptr { [100 x i64], [100 x i64], [100 x i64], [100 x i64], i64, [200 x i32], i32, i32 }, ptr [[TMP0]], i64 0, i32 1			// CHECK-KMSAN-NEXT: [[RETVAL_SHADOW:%.*]] = getelementptr { [100 x i64], [100 x i64], [100 x i64], [100 x i64], i64, [200 x i32], i32, i32 }, ptr [[TMP0]], i64 0, i32 1
	// CHECK-KMSAN-NEXT: [[TMP11:%.*]] = load i32, ptr [[P_0_P_0_P_0_]], align 4, !dbg [[DBG20]], !tbaa [[TBAA11]]			// CHECK-KMSAN-NEXT: [[TMP11:%.*]] = load i32, ptr [[P_0_P_0_P_0_P_0_]], align 4, !dbg [[DBG20]], !tbaa [[TBAA11]]
	// CHECK-KMSAN-NEXT: [[TMP12:%.*]] = call { ptr, ptr } @__msan_metadata_ptr_for_load_4(ptr nonnull [[P_0_P_0_P_0_]]) #[[ATTR2]], !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: [[TMP12:%.*]] = call { ptr, ptr } @__msan_metadata_ptr_for_load_4(ptr nonnull [[P_0_P_0_P_0_P_0_]]) #[[ATTR2]], !dbg [[DBG20]]
	// CHECK-KMSAN-NEXT: [[TMP13:%.*]] = extractvalue { ptr, ptr } [[TMP12]], 0, !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: [[TMP13:%.*]] = extractvalue { ptr, ptr } [[TMP12]], 0, !dbg [[DBG20]]
	// CHECK-KMSAN-NEXT: [[TMP14:%.*]] = extractvalue { ptr, ptr } [[TMP12]], 1, !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: [[TMP14:%.*]] = extractvalue { ptr, ptr } [[TMP12]], 1, !dbg [[DBG20]]
	// CHECK-KMSAN-NEXT: [[_MSLD1:%.*]] = load i32, ptr [[TMP13]], align 4, !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: [[_MSLD1:%.*]] = load i32, ptr [[TMP13]], align 4, !dbg [[DBG20]]
	// CHECK-KMSAN-NEXT: [[TMP15:%.*]] = load i32, ptr [[TMP14]], align 4, !dbg [[DBG20]]			// CHECK-KMSAN-NEXT: [[TMP15:%.*]] = load i32, ptr [[TMP14]], align 4, !dbg [[DBG20]]
	// CHECK-KMSAN-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG22:![0-9]+]]			// CHECK-KMSAN-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[P]]), !dbg [[DBG22:![0-9]+]]
	// CHECK-KMSAN-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2]], !dbg [[DBG22]]			// CHECK-KMSAN-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nonnull [[X]]) #[[ATTR2]], !dbg [[DBG22]]
	// CHECK-KMSAN-NEXT: store i32 [[_MSLD1]], ptr [[RETVAL_SHADOW]], align 8, !dbg [[DBG23:![0-9]+]]			// CHECK-KMSAN-NEXT: store i32 [[_MSLD1]], ptr [[RETVAL_SHADOW]], align 8, !dbg [[DBG23:![0-9]+]]
	// CHECK-KMSAN-NEXT: store i32 [[TMP15]], ptr [[RETVAL_ORIGIN]], align 4, !dbg [[DBG23]]			// CHECK-KMSAN-NEXT: store i32 [[TMP15]], ptr [[RETVAL_ORIGIN]], align 4, !dbg [[DBG23]]
	// CHECK-KMSAN-NEXT: ret i32 [[TMP11]], !dbg [[DBG23]]			// CHECK-KMSAN-NEXT: ret i32 [[TMP11]], !dbg [[DBG23]]
	//			//
	int test(void) {			int test(void) {
	int x = 3;			int x = 3;
	int *volatile p = &x;			int *volatile p = &x;
	return *p;			return *p;
	}			}

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 1,101 Lines • ▼ Show 20 Lines	if (IsFullLTO) {
// We do UnrollAndJam in a separate LPM to ensure it happens before unroll		// We do UnrollAndJam in a separate LPM to ensure it happens before unroll
if (EnableUnrollAndJam && PTO.LoopUnrolling)		if (EnableUnrollAndJam && PTO.LoopUnrolling)
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LoopUnrollAndJamPass(Level.getSpeedupLevel())));		LoopUnrollAndJamPass(Level.getSpeedupLevel())));
FPM.addPass(LoopUnrollPass(LoopUnrollOptions(		FPM.addPass(LoopUnrollPass(LoopUnrollOptions(
Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,		Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll)));		PTO.ForgetAllSCEVInLoopUnroll)));
FPM.addPass(WarnMissedTransformationsPass());		FPM.addPass(WarnMissedTransformationsPass());
		// Now that we are done with loop unrolling, be it either by LoopVectorizer,
		// or LoopUnroll passes, some variable-offset GEP's into alloca's could have
		// become constant-offset, thus enabling SROA and alloca promotion. Do so.
		FPM.addPass(SROAPass());
}		}

if (!IsFullLTO) {		if (!IsFullLTO) {
// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
FPM.addPass(LoopLoadEliminationPass());		FPM.addPass(LoopLoadEliminationPass());
}		}
// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
FPM.addPass(InstCombinePass());		FPM.addPass(InstCombinePass());

		spatelUnsubmitted Done Reply Inline Actions Is there a reason to put this down here vs. tacking it on the end of the previous IsFullLTO block? If LoopUnroll is the reason for adding SROA, then mention that specifically in the comment? IIRC, all of the FullLTO predicates in this set of passes were questionable (see TODO comment above this function). They just accumulated because the code was duplicated and diverged over time. spatel: Is there a reason to put this down here vs. tacking it on the end of the previous IsFullLTO…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Is there a reason to put this down here vs. tacking it on the end of the previous IsFullLTO block? I don't have any particular reason for this. Will change. If LoopUnroll is the reason for adding SROA, then mention that specifically in the comment? Will do. IIRC, all of the FullLTO predicates in this set of passes were questionable (see TODO comment above this function). They just accumulated because the code was duplicated and diverged over time. Yeah, i remember all that. This is indeed ugly. lebedev.ri: > Is there a reason to put this down here vs. tacking it on the end of the previous IsFullLTO…
if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {		if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {
ExtraVectorPassManager ExtraPasses;		ExtraVectorPassManager ExtraPasses;
// At higher optimization levels, try to clean up any runtime overlap and		// At higher optimization levels, try to clean up any runtime overlap and
// alignment checks inserted by the vectorizer. We want to track correlated		// alignment checks inserted by the vectorizer. We want to track correlated
// runtime checks for two inner loops in the same outer loop, fold any		// runtime checks for two inner loops in the same outer loop, fold any
// common computations, hoist loop-invariant aspects out of any outer loop,		// common computations, hoist loop-invariant aspects out of any outer loop,
// and unswitch the runtime checks if possible. Once hoisted, we may have		// and unswitch the runtime checks if possible. Once hoisted, we may have
// dead (or speculatable) control flows or more combining opportunities.		// dead (or speculatable) control flows or more combining opportunities.
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (!IsFullLTO) {
if (EnableUnrollAndJam && PTO.LoopUnrolling) {		if (EnableUnrollAndJam && PTO.LoopUnrolling) {
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LoopUnrollAndJamPass(Level.getSpeedupLevel())));		LoopUnrollAndJamPass(Level.getSpeedupLevel())));
}		}
FPM.addPass(LoopUnrollPass(LoopUnrollOptions(		FPM.addPass(LoopUnrollPass(LoopUnrollOptions(
Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,		Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll)));		PTO.ForgetAllSCEVInLoopUnroll)));
FPM.addPass(WarnMissedTransformationsPass());		FPM.addPass(WarnMissedTransformationsPass());
		// Now that we are done with loop unrolling, be it either by LoopVectorizer,
		// or LoopUnroll passes, some variable-offset GEP's into alloca's could have
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions As the cross-revision diff shows, running sroa before instcombine, non-unexpectedly, makes more sense. lebedev.ri: As the cross-revision diff shows, running sroa before instcombine, non-unexpectedly, makes more…
		// become constant-offset, thus enabling SROA and alloca promotion. Do so.
		FPM.addPass(SROAPass());
FPM.addPass(InstCombinePass());		FPM.addPass(InstCombinePass());
FPM.addPass(		FPM.addPass(
RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/AllowSpeculation=/true),		/AllowSpeculation=/true),
/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));		/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));
}		}
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: VectorCombinePass			; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
				; CHECK-O-NEXT: Running pass: SROAPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LICMPass			; CHECK-O-NEXT: Running pass: LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-O-NEXT: Running pass: InstSimplifyPass
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: DemandedBitsAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: DemandedBitsAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopUnrollPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopUnrollPass on foo
	; CHECK-O23SZ-NEXT: WarnMissedTransformationsPass on foo			; CHECK-O23SZ-NEXT: WarnMissedTransformationsPass on foo
				; CHECK-O23SZ-NEXT: Running pass: SROAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo			; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo
	; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass on foo			; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass on foo
	; CHECK-O23SZ-NEXT: Running pass: SCCPPass on foo			; CHECK-O23SZ-NEXT: Running pass: SCCPPass on foo
	; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo			; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo
	; CHECK-O23SZ-NEXT: Running pass: BDCEPass on foo			; CHECK-O23SZ-NEXT: Running pass: BDCEPass on foo
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass on foo			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass on foo
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass on foo			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass on foo
	; CHECK-OS-NEXT: Running pass: SLPVectorizerPass on foo			; CHECK-OS-NEXT: Running pass: SLPVectorizerPass on foo
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
				; CHECK-POSTLINK-O-NEXT: Running pass: SROAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: LICMPass
	; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifyPass
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: VectorCombinePass			; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
				; CHECK-O-NEXT: Running pass: SROAPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LICMPass			; CHECK-O-NEXT: Running pass: LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-O-NEXT: Running pass: InstSimplifyPass
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: VectorCombinePass			; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
				; CHECK-O-NEXT: Running pass: SROAPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LICMPass			; CHECK-O-NEXT: Running pass: LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifyPass			; CHECK-O-NEXT: Running pass: InstSimplifyPass
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-retcon-resume-values.ll

Show All 32 Lines	cleanup:
unreachable		unreachable
}		}



define i32 @main() {		define i32 @main() {
; CHECK-LABEL: @main(		; CHECK-LABEL: @main(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = alloca i8, align 8		; CHECK-NEXT: [[TMP0:%.]] = tail call i8 @allocate(i32 12)
; CHECK-NEXT: [[TMP1:%.]] = tail call i8 @allocate(i32 12)		; CHECK-NEXT: [[N_SPILL_ADDR_I:%.]] = bitcast i8 [[TMP0]] to i32*
; CHECK-NEXT: store i8* [[TMP1]], i8** [[TMP0]], align 8
; CHECK-NEXT: [[N_SPILL_ADDR_I:%.]] = bitcast i8 [[TMP1]] to i32*
; CHECK-NEXT: store i32 1, i32* [[N_SPILL_ADDR_I]], align 4		; CHECK-NEXT: store i32 1, i32* [[N_SPILL_ADDR_I]], align 4
; CHECK-NEXT: [[TMP2:%.]] = bitcast i8* [[TMP0]] to %f.Frame**		; CHECK-NEXT: [[N_VAL3_SPILL_ADDR_I:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 4
; CHECK-NEXT: [[N_VAL3_SPILL_ADDR_I:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[N_VAL3_SPILL_ADDR_I]] to i32*
; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[N_VAL3_SPILL_ADDR_I]] to i32*		; CHECK-NEXT: store i32 1, i32* [[TMP1]], align 4, !noalias !0
; CHECK-NEXT: store i32 1, i32* [[TMP3]], align 4, !noalias !0		; CHECK-NEXT: [[INPUT_SPILL_ADDR_I:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 8
; CHECK-NEXT: [[INPUT_SPILL_ADDR_I:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8		; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[INPUT_SPILL_ADDR_I]] to i32*
; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[INPUT_SPILL_ADDR_I]] to i32*		; CHECK-NEXT: store i32 2, i32* [[TMP2]], align 4, !noalias !0
; CHECK-NEXT: store i32 2, i32* [[TMP4]], align 4, !noalias !0		; CHECK-NEXT: [[INPUT_RELOAD_ADDR13_I:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 8
; CHECK-NEXT: tail call void @llvm.experimental.noalias.scope.decl(metadata [[META3:![0-9]+]])		; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[INPUT_RELOAD_ADDR13_I]] to i32*
; CHECK-NEXT: [[FRAMEPTR_I1:%.]] = load %f.Frame, %f.Frame** [[TMP2]], align 8, !alias.scope !3		; CHECK-NEXT: [[N_VAL3_RELOAD_ADDR11_I:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 4
; CHECK-NEXT: [[INPUT_RELOAD_ADDR13_I:%.]] = getelementptr inbounds [[F_FRAME:%.]], %f.Frame* [[FRAMEPTR_I1]], i64 0, i32 2		; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[N_VAL3_RELOAD_ADDR11_I]] to i32*
; CHECK-NEXT: [[INPUT_RELOAD14_I:%.]] = load i32, i32 [[INPUT_RELOAD_ADDR13_I]], align 4, !noalias !3		; CHECK-NEXT: [[N_VAL3_RELOAD12_I:%.]] = load i32, i32 [[TMP4]], align 4, !noalias !3
; CHECK-NEXT: [[N_VAL3_RELOAD_ADDR11_I:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR_I1]], i64 0, i32 1		; CHECK-NEXT: [[SUM7_I:%.*]] = add i32 [[N_VAL3_RELOAD12_I]], 2
; CHECK-NEXT: [[N_VAL3_RELOAD12_I:%.]] = load i32, i32 [[N_VAL3_RELOAD_ADDR11_I]], align 4, !noalias !3		; CHECK-NEXT: store i32 [[SUM7_I]], i32* [[TMP4]], align 4, !noalias !3
; CHECK-NEXT: [[SUM7_I:%.*]] = add i32 [[N_VAL3_RELOAD12_I]], [[INPUT_RELOAD14_I]]		; CHECK-NEXT: store i32 4, i32* [[TMP3]], align 4, !noalias !3
; CHECK-NEXT: store i32 [[SUM7_I]], i32* [[N_VAL3_RELOAD_ADDR11_I]], align 4, !noalias !3		; CHECK-NEXT: [[SUM7_I7:%.*]] = add i32 [[N_VAL3_RELOAD12_I]], 6
; CHECK-NEXT: store i32 4, i32* [[INPUT_RELOAD_ADDR13_I]], align 4, !noalias !3
; CHECK-NEXT: tail call void @llvm.experimental.noalias.scope.decl(metadata [[META6:![0-9]+]])
; CHECK-NEXT: [[FRAMEPTR_I2:%.]] = load %f.Frame, %f.Frame** [[TMP2]], align 8, !alias.scope !6
; CHECK-NEXT: [[INPUT_RELOAD_ADDR13_I3:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR_I2]], i64 0, i32 2
; CHECK-NEXT: [[INPUT_RELOAD14_I4:%.]] = load i32, i32 [[INPUT_RELOAD_ADDR13_I3]], align 4, !noalias !6
; CHECK-NEXT: [[N_VAL3_RELOAD_ADDR11_I5:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR_I2]], i64 0, i32 1
; CHECK-NEXT: [[N_VAL3_RELOAD12_I6:%.]] = load i32, i32 [[N_VAL3_RELOAD_ADDR11_I5]], align 4, !noalias !6
; CHECK-NEXT: [[SUM7_I7:%.*]] = add i32 [[N_VAL3_RELOAD12_I6]], [[INPUT_RELOAD14_I4]]
; CHECK-NEXT: tail call void @print(i32 [[SUM7_I7]]), !noalias !6		; CHECK-NEXT: tail call void @print(i32 [[SUM7_I7]]), !noalias !6
; CHECK-NEXT: [[TMP5:%.]] = bitcast %f.Frame [[FRAMEPTR_I2]] to i8*		; CHECK-NEXT: tail call void @deallocate(i8* [[TMP0]]), !noalias !6
; CHECK-NEXT: tail call void @deallocate(i8* [[TMP5]]), !noalias !6
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
;		;
entry:		entry:
%0 = alloca [8 x i8], align 4		%0 = alloca [8 x i8], align 4
%buffer = bitcast [8 x i8]* %0 to i8*		%buffer = bitcast [8 x i8]* %0 to i8*
%prepare = call i8* @llvm.coro.prepare.retcon(i8* bitcast (i8* (i8, i32) @f to i8*))		%prepare = call i8* @llvm.coro.prepare.retcon(i8* bitcast (i8* (i8, i32) @f to i8*))
%f = bitcast i8* %prepare to i8* (i8, i32)		%f = bitcast i8* %prepare to i8* (i8, i32)
%cont0 = call i8* %f(i8* %buffer, i32 1)		%cont0 = call i8* %f(i8* %buffer, i32 1)
Show All 25 Lines

llvm/test/Transforms/PhaseOrdering/X86/SROA-after-final-loop-unrolling-2.ll

	Show All 16 Lines
	$_ZNSt14__array_traitsIiLm2EE6_S_refERA2_Kim = comdat any			$_ZNSt14__array_traitsIiLm2EE6_S_refERA2_Kim = comdat any

	@global = private unnamed_addr constant %t0 { [2 x i32] [i32 24, i32 42] }, align 4			@global = private unnamed_addr constant %t0 { [2 x i32] [i32 24, i32 42] }, align 4

	; Function Attrs: mustprogress nounwind uwtable			; Function Attrs: mustprogress nounwind uwtable
	define dso_local void @foo(i32 noundef %arg, ptr noundef nonnull align 4 dereferenceable(8) %arg1) #0 {			define dso_local void @foo(i32 noundef %arg, ptr noundef nonnull align 4 dereferenceable(8) %arg1) #0 {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[I3:%.]] = alloca [[T0:%.]], align 8
	; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr nonnull [[I3]]) #[[ATTR2:[0-9]+]]
	; CHECK-NEXT: store i64 180388626456, ptr [[I3]], align 8
	; CHECK-NEXT: [[I9:%.]] = sdiv i32 [[ARG:%.]], 128			; CHECK-NEXT: [[I9:%.]] = sdiv i32 [[ARG:%.]], 128
	; CHECK-NEXT: [[I10:%.*]] = shl nsw i32 [[I9]], 7			; CHECK-NEXT: [[I10:%.*]] = shl nsw i32 [[I9]], 7
	; CHECK-NEXT: [[ARG_OFF:%.*]] = add i32 [[ARG]], 127			; CHECK-NEXT: [[ARG_OFF:%.*]] = add i32 [[ARG]], 127
	; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[ARG_OFF]], 255			; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[ARG_OFF]], 255
	; CHECK-NEXT: br i1 [[TMP0]], label [[BB12:%.]], label [[BB13_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[TMP0]], label [[BB12:%.]], label [[BB13:%.]]
	; CHECK: bb13.preheader:
	; CHECK-NEXT: [[I5_I_I_1:%.*]] = getelementptr inbounds [2 x i32], ptr [[I3]], i64 0, i64 1
	; CHECK-NEXT: [[I3_PROMOTED:%.*]] = load i32, ptr [[I3]], align 8, !tbaa [[TBAA5:![0-9]+]]
	; CHECK-NEXT: [[I5_I_I_1_PROMOTED:%.*]] = load i32, ptr [[I5_I_I_1]], align 4, !tbaa [[TBAA5]]
	; CHECK-NEXT: br label [[BB13:%.*]]
	; CHECK: bb12.loopexit:			; CHECK: bb12.loopexit:
	; CHECK-NEXT: store i32 [[I21_2:%.*]], ptr [[I3]], align 8, !tbaa [[TBAA5]]			; CHECK-NEXT: [[I3_SROA_8_0_INSERT_EXT:%.]] = zext i32 [[I21_3:%.]] to i64
	; CHECK-NEXT: store i32 [[I21_3:%.*]], ptr [[I5_I_I_1]], align 4, !tbaa [[TBAA5]]			; CHECK-NEXT: [[I3_SROA_8_0_INSERT_SHIFT:%.*]] = shl nuw i64 [[I3_SROA_8_0_INSERT_EXT]], 32
	; CHECK-NEXT: [[DOTPRE:%.*]] = load i64, ptr [[I3]], align 8, !tbaa [[TBAA9:![0-9]+]]			; CHECK-NEXT: [[I3_SROA_0_0_INSERT_EXT:%.]] = zext i32 [[I21_2:%.]] to i64
				; CHECK-NEXT: [[I3_SROA_0_0_INSERT_INSERT:%.*]] = or i64 [[I3_SROA_8_0_INSERT_SHIFT]], [[I3_SROA_0_0_INSERT_EXT]]
	; CHECK-NEXT: br label [[BB12]]			; CHECK-NEXT: br label [[BB12]]
	; CHECK: bb12:			; CHECK: bb12:
	; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ [[DOTPRE]], [[BB12_LOOPEXIT:%.]] ], [ 180388626456, [[BB:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ [[I3_SROA_0_0_INSERT_INSERT]], [[BB12_LOOPEXIT:%.]] ], [ 180388626456, [[BB:%.*]] ]
	; CHECK-NEXT: store i64 [[TMP1]], ptr [[ARG1:%.*]], align 4, !tbaa [[TBAA9]]			; CHECK-NEXT: store i64 [[TMP1]], ptr [[ARG1:%.*]], align 4, !tbaa [[TBAA5:![0-9]+]]
	; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr nonnull [[I3]]) #[[ATTR2]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb13:			; CHECK: bb13:
	; CHECK-NEXT: [[I21_113:%.*]] = phi i32 [ [[I5_I_I_1_PROMOTED]], [[BB13_PREHEADER]] ], [ [[I21_3]], [[BB13]] ]			; CHECK-NEXT: [[I3_SROA_8_0:%.*]] = phi i32 [ [[I21_3]], [[BB13]] ], [ 42, [[BB]] ]
	; CHECK-NEXT: [[I20_212:%.*]] = phi i32 [ [[I3_PROMOTED]], [[BB13_PREHEADER]] ], [ [[I21_2]], [[BB13]] ]			; CHECK-NEXT: [[I3_SROA_0_0:%.*]] = phi i32 [ [[I21_2]], [[BB13]] ], [ 24, [[BB]] ]
	; CHECK-NEXT: [[I4_05:%.]] = phi i32 [ 0, [[BB13_PREHEADER]] ], [ [[I24_3:%.]], [[BB13]] ]			; CHECK-NEXT: [[I4_05:%.]] = phi i32 [ [[I24_3:%.]], [[BB13]] ], [ 0, [[BB]] ]
	; CHECK-NEXT: [[I21:%.*]] = mul nsw i32 [[I20_212]], [[I4_05]]			; CHECK-NEXT: [[I21:%.*]] = mul nsw i32 [[I3_SROA_0_0]], [[I4_05]]
	; CHECK-NEXT: [[I24:%.*]] = or i32 [[I4_05]], 1			; CHECK-NEXT: [[I24:%.*]] = or i32 [[I4_05]], 1
	; CHECK-NEXT: [[I21_1:%.*]] = mul nsw i32 [[I21_113]], [[I24]]			; CHECK-NEXT: [[I21_1:%.*]] = mul nsw i32 [[I3_SROA_8_0]], [[I24]]
	; CHECK-NEXT: [[I24_1:%.*]] = or i32 [[I4_05]], 2			; CHECK-NEXT: [[I24_1:%.*]] = or i32 [[I4_05]], 2
	; CHECK-NEXT: [[I21_2]] = mul nsw i32 [[I21]], [[I24_1]]			; CHECK-NEXT: [[I21_2]] = mul nsw i32 [[I21]], [[I24_1]]
	; CHECK-NEXT: [[I24_2:%.*]] = or i32 [[I4_05]], 3			; CHECK-NEXT: [[I24_2:%.*]] = or i32 [[I4_05]], 3
	; CHECK-NEXT: [[I21_3]] = mul nsw i32 [[I21_1]], [[I24_2]]			; CHECK-NEXT: [[I21_3]] = mul nsw i32 [[I21_1]], [[I24_2]]
	; CHECK-NEXT: [[I24_3]] = add nuw nsw i32 [[I4_05]], 4			; CHECK-NEXT: [[I24_3]] = add nuw nsw i32 [[I4_05]], 4
	; CHECK-NEXT: [[I11_NOT_3:%.*]] = icmp eq i32 [[I24_3]], [[I10]]			; CHECK-NEXT: [[I11_NOT_3:%.*]] = icmp eq i32 [[I24_3]], [[I10]]
	; CHECK-NEXT: br i1 [[I11_NOT_3]], label [[BB12_LOOPEXIT]], label [[BB13]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[I11_NOT_3]], label [[BB12_LOOPEXIT]], label [[BB13]], !llvm.loop [[LOOP8:![0-9]+]]
	;			;
	bb:			bb:
	%i = alloca i32, align 4			%i = alloca i32, align 4
	%i2 = alloca ptr, align 8			%i2 = alloca ptr, align 8
	%i3 = alloca %t0, align 4			%i3 = alloca %t0, align 4
	%i4 = alloca i32, align 4			%i4 = alloca i32, align 4
	%i5 = alloca ptr, align 8			%i5 = alloca ptr, align 8
	store i32 %arg, ptr %i, align 4, !tbaa !5			store i32 %arg, ptr %i, align 4, !tbaa !5
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/SROA-after-final-loop-unrolling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -O3 -S \| FileCheck %s			; RUN: opt < %s -O3 -S \| FileCheck %s
	; RUN: opt < %s -passes="default<O3>" -S \| FileCheck %s			; RUN: opt < %s -passes="default<O3>" -S \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc-linux-gnu"			target triple = "x86_64-pc-linux-gnu"

	%t0 = type { ptr, ptr }			%t0 = type { ptr, ptr }
	%t1 = type { [16 x i32] }			%t1 = type { [16 x i32] }
	%t2 = type { %t3, ptr }			%t2 = type { %t3, ptr }
	%t3 = type { i8 }			%t3 = type { i8 }

	define void @wibble(ptr %arg) personality ptr null {			define void @wibble(ptr %arg) personality ptr null {
	; CHECK-LABEL: @wibble(			; CHECK-LABEL: @wibble(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[I1:%.]] = alloca [[T1:%.]], align 16
	; CHECK-NEXT: [[I10_3_I_PRE:%.]] = load i8, ptr [[ARG:%.]], align 1			; CHECK-NEXT: [[I10_3_I_PRE:%.]] = load i8, ptr [[ARG:%.]], align 1
	; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i8> poison, i8 [[I10_3_I_PRE]], i64 3			; CHECK-NEXT: [[TMP0:%.*]] = or i8 [[I10_3_I_PRE]], 1
	; CHECK-NEXT: [[TMP0:%.*]] = getelementptr [64 x i8], ptr [[ARG]], i64 0, i64 1			; CHECK-NEXT: [[I1_SROA_0_0_VEC_EXTRACT:%.*]] = zext i8 [[TMP0]] to i32
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP0]], align 1			; CHECK-NEXT: [[I4_I_I:%.*]] = add nuw nsw i32 [[I1_SROA_0_0_VEC_EXTRACT]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i8> [[VECTOR_RECUR_INIT]], <4 x i8> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP2:%.*]] = or <4 x i8> [[TMP1]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP3]], ptr [[I1]], align 16
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds [16 x i32], ptr [[I1]], i64 0, i64 4
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr [64 x i8], ptr [[ARG]], i64 0, i64 5
	; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x i8>, ptr [[TMP5]], align 1
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i8> [[WIDE_LOAD]], <4 x i8> [[WIDE_LOAD_1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP7:%.*]] = or <4 x i8> [[TMP6]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP8:%.*]] = zext <4 x i8> [[TMP7]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP8]], ptr [[TMP4]], align 16
	; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds [16 x i32], ptr [[I1]], i64 0, i64 8
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr [64 x i8], ptr [[ARG]], i64 0, i64 9
	; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x i8>, ptr [[TMP10]], align 1
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i8> [[WIDE_LOAD_1]], <4 x i8> [[WIDE_LOAD_2]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP12:%.*]] = or <4 x i8> [[TMP11]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP13:%.*]] = zext <4 x i8> [[TMP12]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP13]], ptr [[TMP9]], align 16
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds [16 x i32], ptr [[I1]], i64 0, i64 12
	; CHECK-NEXT: [[TMP15:%.*]] = getelementptr [64 x i8], ptr [[ARG]], i64 0, i64 13
	; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x i8>, ptr [[TMP15]], align 1
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <4 x i8> [[WIDE_LOAD_2]], <4 x i8> [[WIDE_LOAD_3]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP17:%.*]] = or <4 x i8> [[TMP16]], <i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP18:%.*]] = zext <4 x i8> [[TMP17]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP18]], ptr [[TMP14]], align 16
	; CHECK-NEXT: [[I3_I_I:%.*]] = load i32, ptr [[I1]], align 16
	; CHECK-NEXT: [[I4_I_I:%.*]] = add i32 [[I3_I_I]], 1
	; CHECK-NEXT: store i32 [[I4_I_I]], ptr [[ARG]], align 4			; CHECK-NEXT: store i32 [[I4_I_I]], ptr [[ARG]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%i = alloca [0 x [0 x [0 x [0 x [0 x [0 x %t0]]]]]], i32 0, align 8			%i = alloca [0 x [0 x [0 x [0 x [0 x [0 x %t0]]]]]], i32 0, align 8
	%i1 = alloca %t1, align 4			%i1 = alloca %t1, align 4
	store ptr %arg, ptr %i, align 8			store ptr %arg, ptr %i, align 8
	%i2 = getelementptr %t0, ptr %i, i64 0, i32 1			%i2 = getelementptr %t0, ptr %i, i64 0, i32 1
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -O2 < %s \| FileCheck %s			; RUN: opt -S -O2 < %s \| FileCheck %s

	; Test a single-iteration loop that should get SROAd once we realize that fact.			; Test a single-iteration loop that should get SROAd once we realize that fact.
	; It should compile down to a bswap.			; It should compile down to a bswap.

	; The helper function exists to avoid IPSCCP breaking the loop too early.			; The helper function exists to avoid IPSCCP breaking the loop too early.

	define i16 @helper(i16 %0, i64 %x) {			define i16 @helper(i16 %0, i64 %x) {
	; CHECK-LABEL: @helper(			; CHECK-LABEL: @helper(
	; CHECK-NEXT: start:			; CHECK-NEXT: start:
	; CHECK-NEXT: [[DATA:%.*]] = alloca [2 x i8], align 2			; CHECK-NEXT: [[DATA:%.*]] = alloca [2 x i8], align 2
	; CHECK-NEXT: store i16 [[TMP0:%.*]], ptr [[DATA]], align 2			; CHECK-NEXT: store i16 [[TMP0:%.*]], ptr [[DATA]], align 2
	; CHECK-NEXT: br label [[BB6_I_I:%.*]]			; CHECK-NEXT: br label [[BB6_I_I:%.*]]
	; CHECK: bb6.i.i:			; CHECK: bb6.i.i:
	; CHECK-NEXT: [[ITER_SROA_0_07_I_I:%.]] = phi i64 [ [[TMP2:%.]], [[BB6_I_I]] ], [ 0, [[START:%.*]] ]			; CHECK-NEXT: [[ITER_SROA_0_07_I_I:%.]] = phi i64 [ [[TMP1:%.]], [[BB6_I_I]] ], [ 0, [[START:%.*]] ]
	; CHECK-NEXT: [[_40_I_I:%.*]] = sub nsw i64 0, [[ITER_SROA_0_07_I_I]]			; CHECK-NEXT: [[_40_I_I:%.*]] = sub nsw i64 0, [[ITER_SROA_0_07_I_I]]
	; CHECK-NEXT: [[TMP2]] = add nuw nsw i64 [[ITER_SROA_0_07_I_I]], 1			; CHECK-NEXT: [[TMP1]] = add nuw nsw i64 [[ITER_SROA_0_07_I_I]], 1
	; CHECK-NEXT: [[_34_I_I:%.*]] = getelementptr inbounds [0 x i8], ptr [[DATA]], i64 0, i64 [[ITER_SROA_0_07_I_I]]			; CHECK-NEXT: [[_34_I_I:%.*]] = getelementptr inbounds [0 x i8], ptr [[DATA]], i64 0, i64 [[ITER_SROA_0_07_I_I]]
	; CHECK-NEXT: [[TMP1:%.*]] = getelementptr [0 x i8], ptr [[DATA]], i64 0, i64 [[_40_I_I]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr [0 x i8], ptr [[DATA]], i64 0, i64 [[_40_I_I]]
	; CHECK-NEXT: [[_39_I_I:%.]] = getelementptr i8, ptr [[TMP1:%.]], i64 1			; CHECK-NEXT: [[_39_I_I:%.*]] = getelementptr i8, ptr [[TMP2]], i64 1
	; CHECK-NEXT: [[TMP_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_34_I_I]], align 1			; CHECK-NEXT: [[TMP_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_34_I_I]], align 1
	; CHECK-NEXT: [[TMP2_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_39_I_I]], align 1			; CHECK-NEXT: [[TMP2_0_COPYLOAD_I_I_I_I:%.*]] = load i8, ptr [[_39_I_I]], align 1
	; CHECK-NEXT: store i8 [[TMP2_0_COPYLOAD_I_I_I_I]], ptr [[_34_I_I]], align 1			; CHECK-NEXT: store i8 [[TMP2_0_COPYLOAD_I_I_I_I]], ptr [[_34_I_I]], align 1
	; CHECK-NEXT: store i8 [[TMP_0_COPYLOAD_I_I_I_I]], ptr [[_39_I_I]], align 1			; CHECK-NEXT: store i8 [[TMP_0_COPYLOAD_I_I_I_I]], ptr [[_39_I_I]], align 1
	; CHECK-NEXT: [[EXITCOND_NOT_I_I:%.]] = icmp eq i64 [[TMP2]], [[X:%.]]			; CHECK-NEXT: [[EXITCOND_NOT_I_I:%.]] = icmp eq i64 [[TMP1]], [[X:%.]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT_I_I]], label [[EXIT:%.*]], label [[BB6_I_I]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT_I_I]], label [[EXIT:%.*]], label [[BB6_I_I]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[DOTSROA_0_0_COPYLOAD:%.*]] = load i16, ptr [[DATA]], align 2			; CHECK-NEXT: [[DOTSROA_0_0_COPYLOAD:%.*]] = load i16, ptr [[DATA]], align 2
	; CHECK-NEXT: ret i16 [[DOTSROA_0_0_COPYLOAD]]			; CHECK-NEXT: ret i16 [[DOTSROA_0_0_COPYLOAD]]
	;			;
	start:			start:
	%data = alloca [2 x i8], align 2			%data = alloca [2 x i8], align 2
	store i16 %0, ptr %data, align 2			store i16 %0, ptr %data, align 2
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Pipelines] Introduce SROA after (final, full) loop unrollingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476176

clang/test/CodeGen/cleanup-destslot-simple.c

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-lto-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Transforms/Coroutines/coro-retcon-resume-values.ll

llvm/test/Transforms/PhaseOrdering/X86/SROA-after-final-loop-unrolling-2.ll

llvm/test/Transforms/PhaseOrdering/X86/SROA-after-final-loop-unrolling.ll

llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll

[Pipelines] Introduce SROA after (final, full) loop unrolling
ClosedPublic