This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUTargetMachine.cpp
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.cpp
-
test/
-
Other/
-
opt-O2-pipeline.ll
-
opt-O3-pipeline-enable-matrix.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
Transforms/PhaseOrdering/X86/
-
PhaseOrdering/
-
X86/
-
SROA-after-loop-unrolling.ll

Differential D87972

[OldPM] Pass manager: run SROA after (simple) loop unrolling
ClosedPublic

Authored by lebedev.ri on Sep 19 2020, 11:25 AM.

Download Raw Diff

Details

Reviewers

fhahn
rampitec
arsenm
MaskRay
craig.topper
Carrot
hliao
nikic
xbolva00
RKSimon

Commits

rG03bd5198b6f7: [OldPM] Pass manager: run SROA after (simple) loop unrolling

Summary

I have stumbled into this pretty accidentally, when rewriting some spaghetti-like code
into something more structured, which involved using some std::array<>s.
And to my surprise, the allocas remained, causing about +160% perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of +0.08%.

This fixes PR40011, PR42794 and probably some other reports.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60 ms	linux > Clang.CodeGenCXX::union-tbaa2.cpp
	80 ms	linux > Clang.Misc::loop-opt-setup.c
	90 ms	windows > Clang.CodeGenCXX::union-tbaa2.cpp
	190 ms	windows > Clang.Misc::loop-opt-setup.c

Event Timeline

lebedev.ri created this revision.Sep 19 2020, 11:25 AM

Herald added subscribers: nikic, kerbowa, zzheng and 3 others. · View Herald TranscriptSep 19 2020, 11:25 AM

lebedev.ri requested review of this revision.Sep 19 2020, 11:25 AM

Herald added a subscriber: wdng. · View Herald TranscriptSep 19 2020, 11:25 AM

Resolves also https://bugs.llvm.org/show_bug.cgi?id=42794?

But +1 for this change, we should have late sroa.

Also check llvm/test/Other/unroll-sroa.ll

In D87972#2283683, @xbolva00 wrote:

Resolves also https://bugs.llvm.org/show_bug.cgi?id=42794?

Looks like it does, yes.

But +1 for this change, we should have late sroa.

Also check llvm/test/Other/unroll-sroa.ll

check-llvm passes, what should i look for specifically?
(didn't look at clang tests yet)

lebedev.ri added reviewers: RKSimon, craig.topper.Sep 19 2020, 11:53 AM

I meant added run line for OLDPM in unroll-sroa.ll.

lebedev.ri edited the summary of this revision. (Show Details)Sep 19 2020, 11:54 AM

Maybe this could help https://bugs.llvm.org/show_bug.cgi?id=47439 too?

geomean compile-time cost of +0.08%

Thanks. This is totally fine.

lebedev.ri edited the summary of this revision. (Show Details)Sep 19 2020, 11:58 AM

Harbormaster completed remote builds in B72283: Diff 292980.Sep 19 2020, 11:59 AM

In D87972#2283701, @xbolva00 wrote:

Maybe this could help https://bugs.llvm.org/show_bug.cgi?id=47439 too?

I'm afraid it doesn't seem to help that one, no.

Fixing a few clang tests and updating one more llvm test to check this also.

Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2020, 12:37 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B72286: Diff 292984.Sep 19 2020, 1:19 PM

xbolva00 added a reviewer: Carrot.Sep 19 2020, 10:40 PM

Surprising this causes only such a small perf regression. I guess it should be OK given that, but here probably are some pathological cases out there where this may cause some noticeable compile-time regressions.

IIUC the additional cases this catches come mainly from fully unrolled loops. If only we had a better way to conditionally run passes :) Then we would ideally only run SROA (and other additional simplification passes) late on functions that had loops fully unrolled. One lightweight way to do so would be to have loop unroll add an 'additional-simplification' attribute to functions that contain loops which it full unrolled and have SROA just run again late if the attribute is present. A similar approach may be helpful in other places too (e.g. the off-by-default -extra-vectorizer-passes option, https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp#L774)

lebedev.ri edited the summary of this revision. (Show Details)Sep 20 2020, 3:56 AM

lebedev.ri edited the summary of this revision. (Show Details)Sep 20 2020, 4:01 AM

https://reviews.llvm.org/D68593 added late SROA to NPM so it would be good to enable it for LPM as well.

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Should be same story for NPM since NPM also enables SROA after unrolling.

A) Commit this patch and start working on general solution for LPM and NPM.

B) Ignore this patch. But after LLVM switches to NPM, you have same issue to solve anyway.

(I'm guessing that we are talking about run-time performance here.)

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression.

Yep, as usual.

Perhaps we do need a conditional extra SROA run.

I think i don't understand the gist

If we don't run it in the cases we expect it wouldn't do anything,
it should still be run in the cases where it *does* do something,
so i'm not sure how conditioning it's run helps with anything.
(well, other than compile-time)

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Snappy - you mean public https://github.com/google/snappy?

Well, it should be possible to analyze it...

@lebedev.ri any perf data from testsuite/rawspeed?

spatel added a subscriber: spatel.Sep 20 2020, 10:43 AM

In D87972#2284064, @xbolva00 wrote:

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Snappy - you mean public https://github.com/google/snappy?

Well, it should be possible to analyze it...

@lebedev.ri any perf data from testsuite/rawspeed?

I did look.

sroa-after-unroll.rsbench.txt68 KBDownload

This suggests that geomean is -0.8% runtime improvement,
with ups&downs.

But as i have said in the patch's description, i stumbled into this when writing new code, where the effect is much larger.

I assume this makes 1f4e7463b5e3ff654c84371527767830e51db10d redundant?

In D87972#2285176, @arsenm wrote:

I assume this makes 1f4e7463b5e3ff654c84371527767830e51db10d redundant?

Yes, see llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp change.

xbolva00 added inline comments.Sep 21 2020, 5:55 AM

clang/test/Misc/loop-opt-setup.c
2 ↗	(On Diff #292984)	OLDPM?

In D87972#2284096, @lebedev.ri wrote:

In D87972#2284064, @xbolva00 wrote:

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Snappy - you mean public https://github.com/google/snappy?

Well, it should be possible to analyze it...

@lebedev.ri any perf data from testsuite/rawspeed?

I did look.
sroa-after-unroll.rsbench.txt68 KBDownload

This suggests that geomean is -0.8% runtime improvement,
with ups&downs.

But as i have said in the patch's description, i stumbled into this when writing new code, where the effect is much larger.

We probably need to collect more performance data of more benchmarks on more platforms (different targets) to understand the impact. I hesitate to add https://reviews.llvm.org/rG1f4e7463b5e3ff654c84371527767830e51db10d as a generic one as some targets may have regressions due to potentially very different memory access patterns.

This is obviously LGTM from the AMDGPU BE point of view, we did it ourselves.

X86 data collected by @lebedev.ri looks good as well.

@dmgreen for arm?:)

xbolva00 added a subscriber: dmgreen.Sep 21 2020, 12:03 PM

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

That's a bit of a surprise to me! It would be great to know how/why running SROA later makes things worse in some cases.

@dmgreen for arm?:)

This would seem more like a good general codegen cleanup than something that would be target dependent. It would probably be more dependent on the code that is being run, than the exact target. But yeah, I ran some baremetal tests. Only one changed (including in all the codesize tests), which was a nasty complicated state machine. It changed between -5% and +3.5%, depend on the cpu it ran on. (Unfortunately it went down on the cores I was more interested in).

I have run into this exact problem recently and very nearly put up a very similar patch for it. In that case it was making intrinsic MVE code much easier to write, as you could rely on loops not clogging up stack array uses after they were unrolled. The differences were not quite in the 160% range, but they would be nice improvements.

So, a little reluctantly, this does sound like a good idea to me. I asked Sanne to run Spec too if he has the time.

SPEC 2017 on AArch64 is neutral on the geomean. The only slight worry is omnetpp with a 1% regression, but this is balanced by a .8% improvement on mcf. Other changes are in the noise.

RKSimon resigned from this revision.Sep 23 2020, 2:18 AM

@MaskRay, @dmgreen & @sanwou01 thank you for running perf experiment!

I think all the results are consistent along the lines of "this sounds
generally reasonable (esp. given that new-pm does it already),
as usual results in ups&downs, but seems to be a (small) geomean win overall".

Does that sound reasonable?
What are the next suggested steps?

Does that sound reasonable?

Yes IMHO.

What are the next suggested steps?

It would be great to isolate and check the cases which regressed a bit.

In D87972#2294488, @xbolva00 wrote:

Does that sound reasonable?

Yes IMHO.

What are the next suggested steps?

It would be great to isolate and check the cases which regressed a bit.

I've rerun my benchmark, and while the results are still the same (runtime geomean -0.53%/-0.40%,
but that obviously depends on the benchmarks), there are some obvious outliers:

rsbench.txt616 KBDownload

I'll try to take a look at that, assuming it's not noise.

Rebased, NFC

Herald added a subscriber: pengfei. · View Herald TranscriptOct 2 2020, 6:51 AM

lebedev.ri marked an inline comment as done.Oct 2 2020, 6:53 AM

Harbormaster completed remote builds in B73789: Diff 295817.Oct 2 2020, 7:03 AM

In D87972#2294614, @lebedev.ri wrote:

In D87972#2294488, @xbolva00 wrote:

Does that sound reasonable?

Yes IMHO.

What are the next suggested steps?

It would be great to isolate and check the cases which regressed a bit.

I've rerun my benchmark, and while the results are still the same (runtime geomean -0.53%/-0.40%,
but that obviously depends on the benchmarks), there are some obvious outliers:

rsbench.txt616 KBDownload

I'll try to take a look at that, assuming it's not noise.

Hmm. So i did just take a look, manually re-benchmarking each of these, and while i still see a few small improvements,
the regressions there are all appear to be basically noise. Not what i was hoping for :/

In D87972#2284060, @MaskRay wrote:

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Does it look like one of the scary "branch predictor got confused"/"code layout changed causing different alignment"?

I'm not really sure what are my potential next steps here.

I'm not really sure what are my potential next steps here.

Maybe just add option to disable late SROA?

Re-fix clang tests

Harbormaster completed remote builds in B73893: Diff 296005.Oct 3 2020, 2:11 PM

I'll just say this LGTM as it establishes parity with what NewPM has been doing for a while already.

Reviewers, in the future, please reject any patches that only change the NewPM pipeline or only change the LegacyPM pipeline, unless there is some good technical reason to do so. If there was one here, it was not mentioned in the original patch.

clang/test/CodeGenCXX/union-tbaa2.cpp
1 ↗	(On Diff #296005)	Remove `-fno-experimental-new-pass-manager` ? It was added to work around the NewPM/LegacyPM discrepancy.
clang/test/Misc/loop-opt-setup.c
2 ↗	(On Diff #292984)	Remove the NewPM/OldPM tests now that behavior is the same?

This revision is now accepted and ready to land.Oct 4 2020, 1:10 AM

I'll just say this LGTM as it establishes parity with what NewPM has been doing for a while already.

In D87972#2310603, @xbolva00 wrote:

! In D87972#2310595, @nikic wrote:

I'll just say this LGTM as it establishes parity with what NewPM has been doing for a while already.

+1

Thank you.
I'm gonna just land this as is then.

Closed by commit rG03bd5198b6f7: [OldPM] Pass manager: run SROA after (simple) loop unrolling (authored by lebedev.ri). · Explain WhyOct 4 2020, 1:54 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG03bd5198b6f7: [OldPM] Pass manager: run SROA after (simple) loop unrolling.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

8 lines

Transforms/

IPO/

PassManagerBuilder.cpp

3 lines

test/

Other/

opt-O2-pipeline.ll

6 lines

opt-O3-pipeline-enable-matrix.ll

2 lines

opt-O3-pipeline.ll

2 lines

opt-Os-pipeline.ll

2 lines

Transforms/

PhaseOrdering/

X86/

SROA-after-loop-unrolling.ll

62 lines

Diff 292980

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 473 Lines • ▼ Show 20 Lines	[EnableOpt](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
// and before other cleanup optimizations.		// and before other cleanup optimizations.
PM.add(createAMDGPULowerKernelAttributesPass());		PM.add(createAMDGPULowerKernelAttributesPass());

// Promote alloca to vector before SROA and loop unroll. If we manage		// Promote alloca to vector before SROA and loop unroll. If we manage
// to eliminate allocas before unroll we may choose to unroll less.		// to eliminate allocas before unroll we may choose to unroll less.
if (EnableOpt)		if (EnableOpt)
PM.add(createAMDGPUPromoteAllocaToVector());		PM.add(createAMDGPUPromoteAllocaToVector());
});		});

Builder.addExtension(
PassManagerBuilder::EP_LoopOptimizerEnd,
[](const PassManagerBuilder &, legacy::PassManagerBase &PM) {
// Add SROA after loop unrolling as more promotable patterns are
// exposed after small loops are fully unrolled.
PM.add(createSROAPass());
});
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// R600 Target Machine (R600 -> Cayman)		// R600 Target Machine (R600 -> Cayman)
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

R600TargetMachine::R600TargetMachine(const Target &T, const Triple &TT,		R600TargetMachine::R600TargetMachine(const Target &T, const Triple &TT,
StringRef CPU, StringRef FS,		StringRef CPU, StringRef FS,
▲ Show 20 Lines • Show All 725 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	if (EnableLoopInterchange)
MPM.add(createLoopInterchangePass()); // Interchange loops		MPM.add(createLoopInterchangePass()); // Interchange loops

// Unroll small loops		// Unroll small loops
MPM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,		MPM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
ForgetAllSCEVInLoopUnroll));		ForgetAllSCEVInLoopUnroll));
addExtensionsToPM(EP_LoopOptimizerEnd, MPM);		addExtensionsToPM(EP_LoopOptimizerEnd, MPM);
// This ends the loop pass pipelines.		// This ends the loop pass pipelines.

		// Break up allocas that may now be splittable after loop unrolling.
		MPM.add(createSROAPass());

if (OptLevel > 1) {		if (OptLevel > 1) {
MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds		MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds
MPM.add(NewGVN ? createNewGVNPass()		MPM.add(NewGVN ? createNewGVNPass()
: createGVNPass(DisableGVNLoadPRE)); // Remove redundancies		: createGVNPass(DisableGVNLoadPRE)); // Remove redundancies
}		}
MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset		MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset
MPM.add(createSCCPPass()); // Constant prop with SCCP		MPM.add(createSCCPPass()); // Constant prop with SCCP

▲ Show 20 Lines • Show All 787 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	; RUN: opt -enable-new-pm=0 -mtriple=x86_64-- -O2 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck --check-prefixes=CHECK,%llvmcheckext %s			; RUN: opt -enable-new-pm=0 -mtriple=x86_64-- -O2 -debug-pass=Structure < %s -o /dev/null 2>&1 \| FileCheck --check-prefixes=CHECK,%llvmcheckext %s

	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK-LABEL: Pass Arguments:			; CHECK-LABEL: Pass Arguments:
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
	; CHECK-NEXT: Type-Based Alias Analysis			; CHECK-NEXT: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Target Library Information			; CHECK-NEXT: Target Library Information
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-EXT: Good Bye World Pass			; CHECK-EXT: Good Bye World Pass
	; CHECK-NOEXT-NOT: Good Bye World Pass			; CHECK-NOEXT-NOT: Good Bye World Pass
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (pre inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: SROA			; CHECK-NEXT: SROA
	; CHECK-NEXT: Early CSE			; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Lower 'expect' Intrinsics			; CHECK-NEXT: Lower 'expect' Intrinsics
	; CHECK-NEXT: Pass Arguments:			; CHECK-NEXT: Pass Arguments:
	; CHECK-NEXT: Target Library Information			; CHECK-NEXT: Target Library Information
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
	; Target Pass Configuration			; Target Pass Configuration
	; CHECK: Type-Based Alias Analysis			; CHECK: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Force set function attributes			; CHECK-NEXT: Force set function attributes
	; CHECK-NEXT: Infer set function attributes			; CHECK-NEXT: Infer set function attributes
	; CHECK-NEXT: Interprocedural Sparse Conditional Constant Propagation			; CHECK-NEXT: Interprocedural Sparse Conditional Constant Propagation
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
				; CHECK-NEXT: SROA
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
				; CHECK-NEXT: SROA
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
				; CHECK-NEXT: SROA
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
				; CHECK-NEXT: SROA
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll

	Show All 16 Lines
	; Not only should the loops be unrolled, no alloca's should be left there.			; Not only should the loops be unrolled, no alloca's should be left there.

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%"struct.std::array" = type { [6 x i32] }			%"struct.std::array" = type { [6 x i32] }

	define dso_local void @_Z3fooi(i32 %cnt) {			define dso_local void @_Z3fooi(i32 %cnt) {
	; OLDPM-LABEL: @_Z3fooi(			; CHECK-LABEL: @_Z3fooi(
	; OLDPM-NEXT: entry:			; CHECK-NEXT: entry:
	; OLDPM-NEXT: [[ARR:%.*]] = alloca %"struct.std::array", align 16			; CHECK-NEXT: [[INC:%.]] = add nsw i32 [[CNT:%.]], 1
	; OLDPM-NEXT: [[TMP0:%.]] = bitcast %"struct.std::array" [[ARR]] to i8*			; CHECK-NEXT: [[INC_1:%.*]] = add nsw i32 [[CNT]], 2
	; OLDPM-NEXT: call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull [[TMP0]])			; CHECK-NEXT: [[INC_2:%.*]] = add nsw i32 [[CNT]], 3
	; OLDPM-NEXT: [[ARRAYDECAY_I_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" [[ARR]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[INC_3:%.*]] = add nsw i32 [[CNT]], 4
	; OLDPM-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" [[ARR]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[INC_4:%.*]] = add nsw i32 [[CNT]], 5
	; OLDPM-NEXT: [[INCDEC_PTR_1:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" [[ARR]], i64 0, i32 0, i64 2			; CHECK-NEXT: [[INC_5:%.*]] = add nsw i32 [[CNT]], 6
	; OLDPM-NEXT: [[INCDEC_PTR_2:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" [[ARR]], i64 0, i32 0, i64 3			; CHECK-NEXT: call void @_Z3usei(i32 [[INC]])
	; OLDPM-NEXT: [[TMP1:%.]] = insertelement <4 x i32> undef, i32 [[CNT:%.]], i32 0			; CHECK-NEXT: call void @_Z3usei(i32 [[INC_1]])
	; OLDPM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> undef, <4 x i32> zeroinitializer			; CHECK-NEXT: call void @_Z3usei(i32 [[INC_2]])
	; OLDPM-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[TMP2]], <i32 1, i32 2, i32 3, i32 4>			; CHECK-NEXT: call void @_Z3usei(i32 [[INC_3]])
	; OLDPM-NEXT: [[TMP4:%.]] = bitcast %"struct.std::array" [[ARR]] to <4 x i32>*			; CHECK-NEXT: call void @_Z3usei(i32 [[INC_4]])
	; OLDPM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 16			; CHECK-NEXT: call void @_Z3usei(i32 [[INC_5]])
	; OLDPM-NEXT: [[INCDEC_PTR_3:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" [[ARR]], i64 0, i32 0, i64 4			; CHECK-NEXT: ret void
	; OLDPM-NEXT: [[INC_4:%.*]] = add nsw i32 [[CNT]], 5
	; OLDPM-NEXT: store i32 [[INC_4]], i32* [[INCDEC_PTR_3]], align 16
	; OLDPM-NEXT: [[INC_5:%.*]] = add nsw i32 [[CNT]], 6
	; OLDPM-NEXT: [[TMP5:%.]] = load i32, i32 [[ARRAYDECAY_I_I_I]], align 16
	; OLDPM-NEXT: call void @_Z3usei(i32 [[TMP5]])
	; OLDPM-NEXT: [[TMP6:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
	; OLDPM-NEXT: call void @_Z3usei(i32 [[TMP6]])
	; OLDPM-NEXT: [[TMP7:%.]] = load i32, i32 [[INCDEC_PTR_1]], align 8
	; OLDPM-NEXT: call void @_Z3usei(i32 [[TMP7]])
	; OLDPM-NEXT: [[TMP8:%.]] = load i32, i32 [[INCDEC_PTR_2]], align 4
	; OLDPM-NEXT: call void @_Z3usei(i32 [[TMP8]])
	; OLDPM-NEXT: [[TMP9:%.]] = load i32, i32 [[INCDEC_PTR_3]], align 16
	; OLDPM-NEXT: call void @_Z3usei(i32 [[TMP9]])
	; OLDPM-NEXT: call void @_Z3usei(i32 [[INC_5]])
	; OLDPM-NEXT: call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull [[TMP0]])
	; OLDPM-NEXT: ret void
	;
	; NEWPM-LABEL: @_Z3fooi(
	; NEWPM-NEXT: entry:
	; NEWPM-NEXT: [[INC:%.]] = add nsw i32 [[CNT:%.]], 1
	; NEWPM-NEXT: [[INC_1:%.*]] = add nsw i32 [[CNT]], 2
	; NEWPM-NEXT: [[INC_2:%.*]] = add nsw i32 [[CNT]], 3
	; NEWPM-NEXT: [[INC_3:%.*]] = add nsw i32 [[CNT]], 4
	; NEWPM-NEXT: [[INC_4:%.*]] = add nsw i32 [[CNT]], 5
	; NEWPM-NEXT: [[INC_5:%.*]] = add nsw i32 [[CNT]], 6
	; NEWPM-NEXT: call void @_Z3usei(i32 [[INC]])
	; NEWPM-NEXT: call void @_Z3usei(i32 [[INC_1]])
	; NEWPM-NEXT: call void @_Z3usei(i32 [[INC_2]])
	; NEWPM-NEXT: call void @_Z3usei(i32 [[INC_3]])
	; NEWPM-NEXT: call void @_Z3usei(i32 [[INC_4]])
	; NEWPM-NEXT: call void @_Z3usei(i32 [[INC_5]])
	; NEWPM-NEXT: ret void
	;			;
	entry:			entry:
	%cnt.addr = alloca i32			%cnt.addr = alloca i32
	%arr = alloca %"struct.std::array"			%arr = alloca %"struct.std::array"
	%__range1 = alloca %"struct.std::array"*			%__range1 = alloca %"struct.std::array"*
	%__begin1 = alloca i32*			%__begin1 = alloca i32*
	%__end1 = alloca i32*			%__end1 = alloca i32*
	%elt = alloca i32*			%elt = alloca i32*
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OldPM] Pass manager: run SROA after (simple) loop unrollingClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 292980

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll

[OldPM] Pass manager: run SROA after (simple) loop unrolling
ClosedPublic