This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
Passes/
-
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
-
PassManagerBuilder.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
simplify-libcalls.ll
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
opt-O2-pipeline.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
Transforms/PhaseOrdering/
-
PhaseOrdering/
-
reassociate-after-unroll.ll

Differential D61726

[Pass Pipeline] Run another round of reassociation after loop pipeline
AbandonedPublic

Authored by nemanjai on May 9 2019, 5:38 AM.

Download Raw Diff

Details

Reviewers

chandlerc
majnemer
spatel
echristo
tstellar
efriedma

Summary

Unrolling can create code that looks a little silly and InstCombine doesn't clean it up. The test case added in this patch ends up with a series of adds in the loop body rather than a shift and add. Reassociation cleans that type of code up, but we don't run it after unrolling.
This patch just adds another round of reassociation after loop unrolling (similarly to what we do with InstCombine).

Performance measurements on PPC show some improvements on a few benchmarks and no noticeable degradations.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.May 9 2019, 5:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2019, 5:38 AM

Herald added subscribers: jsji, dexonsmith, steven_wu and 6 others. · View Herald Transcript

nemanjai added a reviewer: tstellar.May 9 2019, 5:39 AM

spatel mentioned this in rL360340: [LoopVectorizer] fix test file to not run the entire -O3 pipeline.May 9 2019, 6:43 AM

spatel mentioned this in rG012adfbb96cb: [LoopVectorizer] fix test file to not run the entire -O3 pipeline.

Whitney added a subscriber: Whitney.May 9 2019, 6:53 AM

The motivation makes sense to me, but someone else should also review this patch in case there's a better way.
We also need to know if there's a compile-time impact. Get data for building test-suite, clang itself, SPEC, or some other benchmarks?

test/Transforms/LoopVectorize/X86/masked_load_store.ll
2–4 ↗	(On Diff #198800)	Regardless of anything else, this test file was over-reaching, so I fixed that problem: rL360340 If you update/rebase, this should not wiggle with this patch now.
test/Transforms/Reassociate/reassociate-after-unroll.ll
1 ↗	(On Diff #198800)	This test file belongs in test/Transforms/PhaseOrdering. I prefer to have the baseline test with complete, auto-generated checks (utils/update_test_checks.py) committed as a preliminary step, so we can see the before/after diff in this review. If you're updating the new pass manager in this patch, this test should have another RUN line to exercise/verify that path.

Running reassociate after unroll probably makes sense. But I'd like to see compile-time numbers.

How carefully have you considered the exact placement? It looks like you're using different placement for the legacy vs. new pass manager. Do we want to run before the late LICM pass?

dexonsmith removed a subscriber: dexonsmith.May 9 2019, 11:47 AM

In D61726#1497009, @efriedma wrote:

Running reassociate after unroll probably makes sense. But I'd like to see compile-time numbers.

How carefully have you considered the exact placement? It looks like you're using different placement for the legacy vs. new pass manager. Do we want to run before the late LICM pass?

I will collect some compile time numbers with test-suite.
Regarding placement: I didn't really consider it very careful as I don't really know what the tradeoffs would be for various positions. The one thing that seems clear is that it needs to run after unrolling. However, where in the pipeline after unrolling... I am most certainly open to suggestions and can experiment with a few suggested options.

Thanks again for your feedback.

test/Transforms/LoopVectorize/X86/masked_load_store.ll
2–4 ↗	(On Diff #198800)	Will do, thank you.
test/Transforms/Reassociate/reassociate-after-unroll.ll
1 ↗	(On Diff #198800)	I will move it and add a RUN line for the NPM. Thanks for the suggestions.

nemanjai mentioned this in rL360426: [Pass Pipeline][NFC] Add a test prior to committing D61726.May 10 2019, 6:45 AM

nemanjai mentioned this in rGcfc89896e018: [Pass Pipeline][NFC] Add a test prior to committing D61726.

Move the newly added test case and update it to only show the different behaviour (after committing it to show the current behaviour in r360426)
Move the additional run of reassociation before the late LICM pass. I assumed that this is a good place for it in the pipeline since LICM might move things out of the loop and potentially take away some opportunities. This is just based on a weak hunch and I am very much open to suggestions for a better place for this in the pipeline.

I have also run CTMark with and without the patch and it shows a minimal increase in compile time. This was run on a quiet PPC (Power9) machine set up for performance measurements with -j1:

Tests: 10
Metric: compile_time

Program                                        results.base results.modified diff
 test-suite :: CTMark/kimwitu++/kc.test         37.41        37.67            0.7%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    76.74        77.09            0.4%
 test-suite :: CTMark/Bullet/bullet.test        90.14        90.48            0.4%
 test-suite :: CTMark/SPASS/SPASS.test          42.44        42.57            0.3%
 test-suite...:: CTMark/sqlite3/sqlite3.test    45.18        45.30            0.3%
 test-suite...Mark/mafft/pairlocalalign.test    40.60        40.67            0.2%
 test-suite :: CTMark/lencod/lencod.test        63.90        64.01            0.2%
 test-suite...TMark/7zip/7zip-benchmark.test   134.58       134.78            0.2%
 test-suite...-typeset/consumer-typeset.test    35.08        35.05           -0.1%
 test-suite...:: CTMark/ClamAV/clamscan.test    51.71        51.72            0.0%
 Geomean difference                                                           0.3%
       results.base  results.modified       diff
count  10.000000     10.000000         10.000000
mean   61.778200     61.933290         0.002511
std    31.338327     31.402827         0.002218
min    35.080900     35.050700        -0.000861
25%    41.058275     41.143850         0.001559
50%    48.446250     48.509000         0.002115
75%    73.530500     73.816250         0.003576
max    134.579000    134.781900        0.006955

Looks like the new test is failing: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9439/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Areassociate-after-unroll.ll

In D61726#1498006, @thakis wrote:

Looks like the new test is failing: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9439/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Areassociate-after-unroll.ll

Yeah, I pulled it out. I'm sorry about that. I'm not sure how to make this test case work across all targets. Adding the triple didn't seem to work. I'm looking into how to do this.

Adding the triple didn't seem to work

If you're dealing with certain passes like unrolling, they depend on the target actually being compiled, because we query the target for heuristics. You can write something like "REQUIRES: powerpc-registered-target" if necessary.

nemanjai mentioned this in rL360620: [Pass Pipeline][NFC] Add a test prior to committing D61726.May 13 2019, 2:15 PM

nemanjai mentioned this in rG1d662316cbff: [Pass Pipeline][NFC] Add a test prior to committing D61726.

Update the new test case. Thanks @efriedma for the tip.

Herald added a subscriber: kbarton. · View Herald TranscriptMay 13 2019, 6:35 PM

Remove an unrelated change that snuck in by accident.

In D61726#1497988, @nemanjai wrote:

I have also run CTMark with and without the patch and it shows a minimal increase in compile time. This was run on a quiet PPC (Power9) machine set up for performance measurements with -j1:
Geomean difference 0.3%

It would be interesting to see how that result translates on a more typical x86 build machine. Either way, I suspect we'll get different opinions about whether a 0.3% time increase is minimal and whether that cost is worth paying for the runtime perf gains. This might be a case for differentiating between -O2 and -O3?

It would be interesting to see how that result translates on a more typical x86 build machine. Either way, I suspect we'll get different opinions about whether a 0.3% time increase is minimal and whether that cost is worth paying for the runtime perf gains. This might be a case for differentiating between -O2 and -O3?

I don't really have access to a typical x86 server. I can get the numbers on my laptop but I'm not sure how typical that is. Would that suffice?
Also, I am happy to guard this with an -O3 requirement.

In D61726#1502041, @nemanjai wrote:

It would be interesting to see how that result translates on a more typical x86 build machine. Either way, I suspect we'll get different opinions about whether a 0.3% time increase is minimal and whether that cost is worth paying for the runtime perf gains. This might be a case for differentiating between -O2 and -O3?

I don't really have access to a typical x86 server. I can get the numbers on my laptop but I'm not sure how typical that is. Would that suffice?

IMO, it's not required for you to gather more data, but some form of x86 is the common case, so that would be a better data point for most people. If we don't get that experiment pre-commit, then I'd expect some x86 bot to flag this change if it's a problem.

Also, I am happy to guard this with an -O3 requirement.

That would remove potential controversy (again, just my opinion) because that's how we limited 'AggressiveInstCombine', but let's see if anyone else (@efriedma @echristo ?) has a different idea.

-O3 makes sense, probably. I mean, reassociate is unlikely to hurt performance, but a second reassociate pass is unlikely to help much unless some pass like unrolling generates new code after the first reassociate.

Can you post the actual performance results? It's hard to judge whether 0.3% cost across the entire compiler is acceptable without knowing the benefits.

fhahn added a subscriber: fhahn.May 21 2019, 6:06 AM

sidorovd mentioned this in rG80243d9f4eac: [LoopVectorizer] fix test file to not run the entire -O3 pipeline.May 30 2019, 8:57 AM

sidorovd mentioned this in rG98d4e287e73a: [Pass Pipeline][NFC] Add a test prior to committing D61726.May 30 2019, 9:04 AM

sidorovd mentioned this in rG7318ef91d48d: [Pass Pipeline][NFC] Add a test prior to committing D61726.May 30 2019, 9:16 AM

sidorovd mentioned this in rGfbfa5ecce2ac: [LoopVectorizer] fix test file to not run the entire -O3 pipeline.May 30 2019, 10:01 AM

sidorovd mentioned this in rGc1f9ee0e11aa: [Pass Pipeline][NFC] Add a test prior to committing D61726.May 30 2019, 10:06 AM

sidorovd mentioned this in rG94aeb61eeacc: [Pass Pipeline][NFC] Add a test prior to committing D61726.May 30 2019, 10:16 AM

This turns out not to be worth the added compile time.

Herald added subscribers: kerbowa, • wuzish, hiraditya. · View Herald TranscriptMay 22 2020, 3:19 AM

spatel mentioned this in rG2f7c24fe303f: [InstCombine] (A + B) + B --> A + (B << 1).May 22 2020, 9:06 AM

We get a bit of improvement with:
rG2f7c24fe303f
...but it's not ideal. It does provide another case where "early-cse" would help if it ran later.

Revision Contents

Path

Size

lib/

Passes/

PassBuilder.cpp

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

1 line

test/

CodeGen/

AMDGPU/

simplify-libcalls.ll

22 lines

Other/

new-pm-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

opt-O2-pipeline.ll

3 lines

opt-O3-pipeline.ll

3 lines

opt-Os-pipeline.ll

3 lines

Transforms/

PhaseOrdering/

reassociate-after-unroll.ll

28 lines

Diff 199355

lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 902 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildModuleOptimizationPipeline(
// We do UnrollAndJam in a separate LPM to ensure it happens before unroll		// We do UnrollAndJam in a separate LPM to ensure it happens before unroll
if (EnableUnrollAndJam) {		if (EnableUnrollAndJam) {
OptimizePM.addPass(		OptimizePM.addPass(
createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));		createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));
}		}
OptimizePM.addPass(LoopUnrollPass(LoopUnrollOptions(Level)));		OptimizePM.addPass(LoopUnrollPass(LoopUnrollOptions(Level)));
OptimizePM.addPass(WarnMissedTransformationsPass());		OptimizePM.addPass(WarnMissedTransformationsPass());
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());
		OptimizePM.addPass(ReassociatePass());
OptimizePM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		OptimizePM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
OptimizePM.addPass(createFunctionToLoopPassAdaptor(		OptimizePM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),
DebugLogging));		DebugLogging));

// Now that we've vectorized and unrolled loops, we may have more refined		// Now that we've vectorized and unrolled loops, we may have more refined
// alignment information, try to re-derive it here.		// alignment information, try to re-derive it here.
OptimizePM.addPass(AlignmentFromAssumptionsPass());		OptimizePM.addPass(AlignmentFromAssumptionsPass());
▲ Show 20 Lines • Show All 1,370 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 721 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(

// Unroll small loops		// Unroll small loops
MPM.add(createLoopUnrollPass(OptLevel, DisableUnrollLoops,		MPM.add(createLoopUnrollPass(OptLevel, DisableUnrollLoops,
ForgetAllSCEVInLoopUnroll));		ForgetAllSCEVInLoopUnroll));

if (!DisableUnrollLoops) {		if (!DisableUnrollLoops) {
// LoopUnroll may generate some redundency to cleanup.		// LoopUnroll may generate some redundency to cleanup.
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
		MPM.add(createReassociatePass());

// Runtime unrolling will introduce runtime check in loop prologue. If the		// Runtime unrolling will introduce runtime check in loop prologue. If the
// unrolled loop is a inner loop, then the prologue will be inside the		// unrolled loop is a inner loop, then the prologue will be inside the
// outer loop. LICM pass can help to promote the runtime check out if the		// outer loop. LICM pass can help to promote the runtime check out if the
// checked value is loop invariant.		// checked value is loop invariant.
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
}		}

▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/simplify-libcalls.ll

Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	entry:
%call = tail call fast float @_Z3powff(float %tmp, float -5.000000e-01)		%call = tail call fast float @_Z3powff(float %tmp, float -5.000000e-01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %[[r0:.*]] = fmul fast float %__powx2, %tmp
; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21		; GCN: %__powx22 = fmul fast float %[[r0]], %__powx21
; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22		; GCN: %__powprod3 = fmul fast float %__powx22, %__powx21
define amdgpu_kernel void @test_pow_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pow_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z3powff(float %tmp, float 1.100000e+01)		%call = tail call fast float @_Z3powff(float %tmp, float 1.100000e+01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_powr_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_powr_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %[[r0:.*]] = fmul fast float %__powx2, %tmp
; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21		; GCN: %__powx22 = fmul fast float %[[r0]], %__powx21
; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22		; GCN: %__powprod3 = fmul fast float %__powx22, %__powx21
define amdgpu_kernel void @test_powr_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_powr_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z4powrff(float %tmp, float 1.100000e+01)		%call = tail call fast float @_Z4powrff(float %tmp, float 1.100000e+01)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z4powrff(float, float)		declare float @_Z4powrff(float, float)

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pown_c		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pown_c
; GCN: %__powx2 = fmul fast float %tmp, %tmp		; GCN: %__powx2 = fmul fast float %tmp, %tmp
; GCN: %__powx21 = fmul fast float %__powx2, %__powx2		; GCN: %__powx21 = fmul fast float %__powx2, %__powx2
; GCN: %__powx22 = fmul fast float %__powx2, %tmp		; GCN: %[[r0:.*]] = fmul fast float %__powx2, %tmp
; GCN: %[[r0:.*]] = fmul fast float %__powx21, %__powx21		; GCN: %__powx22 = fmul fast float %[[r0]], %__powx21
; GCN: %__powprod3 = fmul fast float %[[r0]], %__powx22		; GCN: %__powprod3 = fmul fast float %__powx22, %__powx21
define amdgpu_kernel void @test_pown_c(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pown_c(float addrspace(1)* nocapture %a) {
entry:		entry:
%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp = load float, float addrspace(1)* %arrayidx, align 4		%tmp = load float, float addrspace(1)* %arrayidx, align 4
%call = tail call fast float @_Z4pownfi(float %tmp, i32 11)		%call = tail call fast float @_Z4pownfi(float %tmp, i32 11)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
}		}

declare float @_Z4pownfi(float, i32)		declare float @_Z4pownfi(float, i32)

; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow		; GCN-LABEL: {{^}}define amdgpu_kernel void @test_pow
; GCN-POSTLINK: tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)		; GCN-POSTLINK: tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)
; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)		; GCN-PRELINK: %__fabs = tail call fast float @_Z4fabsf(float %tmp)
; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)		; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)
; GCN-PRELINK: %__ylogx = fmul fast float %__log2, 1.013000e+03		; GCN-PRELINK: %__ylogx = fmul fast float %__log2, 1.013000e+03
; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)		; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)
; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32		; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32
; GCN-PRELINK: %__pow_sign = and i32 %[[r0]], -2147483648		; GCN-PRELINK: %__pow_sign = and i32 %[[r0]], -2147483648
; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32		; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32
; GCN-PRELINK: %[[r2:.*]] = or i32 %__pow_sign, %[[r1]]		; GCN-PRELINK: %[[r2:.*]] = or i32 %[[r1]], %__pow_sign
; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*		; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*
; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4		; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4
define amdgpu_kernel void @test_pow(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pow(float addrspace(1)* nocapture %a) {
entry:		entry:
%tmp = load float, float addrspace(1)* %a, align 4		%tmp = load float, float addrspace(1)* %a, align 4
%call = tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)		%call = tail call fast float @_Z3powff(float %tmp, float 1.013000e+03)
store float %call, float addrspace(1)* %a, align 4		store float %call, float addrspace(1)* %a, align 4
ret void		ret void
Show All 26 Lines
; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)		; GCN-PRELINK: %__log2 = tail call fast float @_Z4log2f(float %__fabs)
; GCN-PRELINK: %pownI2F = sitofp i32 %conv to float		; GCN-PRELINK: %pownI2F = sitofp i32 %conv to float
; GCN-PRELINK: %__ylogx = fmul fast float %__log2, %pownI2F		; GCN-PRELINK: %__ylogx = fmul fast float %__log2, %pownI2F
; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)		; GCN-PRELINK: %__exp2 = tail call fast float @_Z4exp2f(float %__ylogx)
; GCN-PRELINK: %__yeven = shl i32 %conv, 31		; GCN-PRELINK: %__yeven = shl i32 %conv, 31
; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32		; GCN-PRELINK: %[[r0:.*]] = bitcast float %tmp to i32
; GCN-PRELINK: %__pow_sign = and i32 %__yeven, %[[r0]]		; GCN-PRELINK: %__pow_sign = and i32 %__yeven, %[[r0]]
; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32		; GCN-PRELINK: %[[r1:.*]] = bitcast float %__exp2 to i32
; GCN-PRELINK: %[[r2:.*]] = or i32 %__pow_sign, %[[r1]]		; GCN-PRELINK: %[[r2:.*]] = or i32 %[[r1]], %__pow_sign
; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*		; GCN-PRELINK: %[[r3:.]] = bitcast float addrspace(1) %a to i32 addrspace(1)*
; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4		; GCN-PRELINK: store i32 %[[r2]], i32 addrspace(1)* %[[r3]], align 4
define amdgpu_kernel void @test_pown(float addrspace(1)* nocapture %a) {		define amdgpu_kernel void @test_pown(float addrspace(1)* nocapture %a) {
entry:		entry:
%tmp = load float, float addrspace(1)* %a, align 4		%tmp = load float, float addrspace(1)* %a, align 4
%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1		%arrayidx1 = getelementptr inbounds float, float addrspace(1)* %a, i64 1
%tmp1 = load float, float addrspace(1)* %arrayidx1, align 4		%tmp1 = load float, float addrspace(1)* %arrayidx1, align 4
%conv = fptosi float %tmp1 to i32		%conv = fptosi float %tmp1 to i32
▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
				; CHECK-O-NEXT: Running pass: ReassociatePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass			; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
				; CHECK-POSTLINK-O-NEXT: Running pass: ReassociatePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Starting llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run
	; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
				; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Warn about non-applied transformations			; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
				; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Warn about non-applied transformations			; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
				; CHECK-NEXT: Reassociate expressions
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Warn about non-applied transformations			; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/Transforms/PhaseOrdering/reassociate-after-unroll.ll

	Show All 24 Lines
	; CHECK-NEXT: [[ADD_LCSSA_PH:%.]] = phi i64 [ undef, [[FOR_BODY_LR_PH]] ], [ [[ADD_7:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[ADD_LCSSA_PH:%.]] = phi i64 [ undef, [[FOR_BODY_LR_PH]] ], [ [[ADD_7:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[K_05_UNR:%.]] = phi i64 [ 1, [[FOR_BODY_LR_PH]] ], [ [[AND:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[K_05_UNR:%.]] = phi i64 [ 1, [[FOR_BODY_LR_PH]] ], [ [[AND:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[LCMP_MOD:%.*]] = icmp eq i64 [[XTRAITER]], 0			; CHECK-NEXT: [[LCMP_MOD:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL:%.*]]			; CHECK-NEXT: br i1 [[LCMP_MOD]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL:%.*]]
	; CHECK: for.body.epil:			; CHECK: for.body.epil:
	; CHECK-NEXT: [[G_06_EPIL:%.]] = phi i64 [ [[ADD_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]			; CHECK-NEXT: [[G_06_EPIL:%.]] = phi i64 [ [[ADD_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[K_05_EPIL:%.]] = phi i64 [ [[AND_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[K_05_UNR]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]			; CHECK-NEXT: [[K_05_EPIL:%.]] = phi i64 [ [[AND_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[K_05_UNR]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[EPIL_ITER_SUB:%.]], [[FOR_BODY_EPIL]] ], [ [[XTRAITER]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]			; CHECK-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[EPIL_ITER_SUB:%.]], [[FOR_BODY_EPIL]] ], [ [[XTRAITER]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[AND_EPIL]] = and i64 [[CONV]], [[K_05_EPIL]]			; CHECK-NEXT: [[AND_EPIL]] = and i64 [[K_05_EPIL]], [[CONV]]
	; CHECK-NEXT: [[ADD_EPIL]] = add i64 [[AND_EPIL]], [[G_06_EPIL]]			; CHECK-NEXT: [[ADD_EPIL]] = add i64 [[AND_EPIL]], [[G_06_EPIL]]
	; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1			; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1
	; CHECK-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0			; CHECK-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0
	; CHECK-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL]], !llvm.loop !0			; CHECK-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL]], !llvm.loop !0
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[G_0_LCSSA:%.]] = phi i64 [ undef, [[ENTRY:%.]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[ADD_EPIL]], [[FOR_BODY_EPIL]] ]			; CHECK-NEXT: [[G_0_LCSSA:%.]] = phi i64 [ undef, [[ENTRY:%.]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[ADD_EPIL]], [[FOR_BODY_EPIL]] ]
	; CHECK-NEXT: ret i64 [[G_0_LCSSA]]			; CHECK-NEXT: ret i64 [[G_0_LCSSA]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[G_06:%.*]] = phi i64 [ undef, [[FOR_BODY_LR_PH_NEW]] ], [ [[ADD_7]], [[FOR_BODY]] ]			; CHECK-NEXT: [[G_06:%.*]] = phi i64 [ undef, [[FOR_BODY_LR_PH_NEW]] ], [ [[ADD_7]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[K_05:%.*]] = phi i64 [ 1, [[FOR_BODY_LR_PH_NEW]] ], [ [[AND]], [[FOR_BODY]] ]			; CHECK-NEXT: [[K_05:%.*]] = phi i64 [ 1, [[FOR_BODY_LR_PH_NEW]] ], [ [[AND]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_7:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_7:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[AND]] = and i64 [[CONV]], [[K_05]]			; CHECK-NEXT: [[AND]] = and i64 [[K_05]], [[CONV]]
	; CHECK-NEXT: [[ADD:%.*]] = add i64 [[AND]], [[G_06]]			; CHECK-NEXT: [[FACTOR:%.*]] = mul i64 [[AND]], 8
	; CHECK-NEXT: [[ADD_1:%.*]] = add i64 [[AND]], [[ADD]]			; CHECK-NEXT: [[ADD_7]] = add i64 [[FACTOR]], [[G_06]]
	; CHECK-NEXT: [[ADD_2:%.*]] = add i64 [[AND]], [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = add i64 [[AND]], [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = add i64 [[AND]], [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = add i64 [[AND]], [[ADD_4]]
	; CHECK-NEXT: [[ADD_6:%.*]] = add i64 [[AND]], [[ADD_5]]
	; CHECK-NEXT: [[ADD_7]] = add i64 [[AND]], [[ADD_6]]
	; CHECK-NEXT: [[NITER_NSUB_7]] = add i64 [[NITER]], -8			; CHECK-NEXT: [[NITER_NSUB_7]] = add i64 [[NITER]], -8
	; CHECK-NEXT: [[NITER_NCMP_7:%.*]] = icmp eq i64 [[NITER_NSUB_7]], 0			; CHECK-NEXT: [[NITER_NCMP_7:%.*]] = icmp eq i64 [[NITER_NSUB_7]], 0
	; CHECK-NEXT: br i1 [[NITER_NCMP_7]], label [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[NITER_NCMP_7]], label [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]], label [[FOR_BODY]]
	;			;
	; NPM-LABEL: @func(			; NPM-LABEL: @func(
	; NPM-NEXT: entry:			; NPM-NEXT: entry:
	; NPM-NEXT: [[CMP4:%.]] = icmp eq i64 [[LIMIT:%.]], 0			; NPM-NEXT: [[CMP4:%.]] = icmp eq i64 [[LIMIT:%.]], 0
	; NPM-NEXT: br i1 [[CMP4]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_LR_PH:%.]]			; NPM-NEXT: br i1 [[CMP4]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_LR_PH:%.]]
	Show All 11 Lines
	; NPM-NEXT: [[ADD_LCSSA_PH:%.]] = phi i64 [ undef, [[FOR_BODY_LR_PH]] ], [ [[ADD_7:%.]], [[FOR_BODY]] ]			; NPM-NEXT: [[ADD_LCSSA_PH:%.]] = phi i64 [ undef, [[FOR_BODY_LR_PH]] ], [ [[ADD_7:%.]], [[FOR_BODY]] ]
	; NPM-NEXT: [[K_05_UNR:%.]] = phi i64 [ 1, [[FOR_BODY_LR_PH]] ], [ [[AND_PHI:%.]], [[FOR_BODY]] ]			; NPM-NEXT: [[K_05_UNR:%.]] = phi i64 [ 1, [[FOR_BODY_LR_PH]] ], [ [[AND_PHI:%.]], [[FOR_BODY]] ]
	; NPM-NEXT: [[LCMP_MOD:%.*]] = icmp eq i64 [[XTRAITER]], 0			; NPM-NEXT: [[LCMP_MOD:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; NPM-NEXT: br i1 [[LCMP_MOD]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL:%.*]]			; NPM-NEXT: br i1 [[LCMP_MOD]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL:%.*]]
	; NPM: for.body.epil:			; NPM: for.body.epil:
	; NPM-NEXT: [[G_06_EPIL:%.]] = phi i64 [ [[ADD_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]			; NPM-NEXT: [[G_06_EPIL:%.]] = phi i64 [ [[ADD_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]
	; NPM-NEXT: [[K_05_EPIL:%.]] = phi i64 [ [[AND_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[K_05_UNR]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]			; NPM-NEXT: [[K_05_EPIL:%.]] = phi i64 [ [[AND_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[K_05_UNR]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]
	; NPM-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[EPIL_ITER_SUB:%.]], [[FOR_BODY_EPIL]] ], [ [[XTRAITER]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]			; NPM-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[EPIL_ITER_SUB:%.]], [[FOR_BODY_EPIL]] ], [ [[XTRAITER]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ]
	; NPM-NEXT: [[AND_EPIL]] = and i64 [[CONV]], [[K_05_EPIL]]			; NPM-NEXT: [[AND_EPIL]] = and i64 [[K_05_EPIL]], [[CONV]]
	; NPM-NEXT: [[ADD_EPIL]] = add i64 [[AND_EPIL]], [[G_06_EPIL]]			; NPM-NEXT: [[ADD_EPIL]] = add i64 [[AND_EPIL]], [[G_06_EPIL]]
	; NPM-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1			; NPM-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1
	; NPM-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0			; NPM-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0
	; NPM-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL]], !llvm.loop !0			; NPM-NEXT: br i1 [[EPIL_ITER_CMP]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_EPIL]], !llvm.loop !0
	; NPM: for.cond.cleanup:			; NPM: for.cond.cleanup:
	; NPM-NEXT: [[G_0_LCSSA:%.]] = phi i64 [ undef, [[ENTRY:%.]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[ADD_EPIL]], [[FOR_BODY_EPIL]] ]			; NPM-NEXT: [[G_0_LCSSA:%.]] = phi i64 [ undef, [[ENTRY:%.]] ], [ [[ADD_LCSSA_PH]], [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]] ], [ [[ADD_EPIL]], [[FOR_BODY_EPIL]] ]
	; NPM-NEXT: ret i64 [[G_0_LCSSA]]			; NPM-NEXT: ret i64 [[G_0_LCSSA]]
	; NPM: for.body:			; NPM: for.body:
	; NPM-NEXT: [[G_06:%.]] = phi i64 [ undef, [[FOR_BODY_LR_PH_NEW]] ], [ [[ADD_7]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]			; NPM-NEXT: [[G_06:%.]] = phi i64 [ undef, [[FOR_BODY_LR_PH_NEW]] ], [ [[ADD_7]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]
	; NPM-NEXT: [[AND_PHI]] = phi i64 [ [[AND_0]], [[FOR_BODY_LR_PH_NEW]] ], [ [[AND_1:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; NPM-NEXT: [[AND_PHI]] = phi i64 [ [[AND_0]], [[FOR_BODY_LR_PH_NEW]] ], [ [[AND_1:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; NPM-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_7:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; NPM-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[FOR_BODY_LR_PH_NEW]] ], [ [[NITER_NSUB_7:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; NPM-NEXT: [[ADD:%.*]] = add i64 [[AND_PHI]], [[G_06]]			; NPM-NEXT: [[FACTOR:%.*]] = mul i64 [[AND_PHI]], 8
	; NPM-NEXT: [[ADD_1:%.*]] = add i64 [[AND_PHI]], [[ADD]]			; NPM-NEXT: [[ADD_7]] = add i64 [[FACTOR]], [[G_06]]
	; NPM-NEXT: [[ADD_2:%.*]] = add i64 [[AND_PHI]], [[ADD_1]]
	; NPM-NEXT: [[ADD_3:%.*]] = add i64 [[AND_PHI]], [[ADD_2]]
	; NPM-NEXT: [[ADD_4:%.*]] = add i64 [[AND_PHI]], [[ADD_3]]
	; NPM-NEXT: [[ADD_5:%.*]] = add i64 [[AND_PHI]], [[ADD_4]]
	; NPM-NEXT: [[ADD_6:%.*]] = add i64 [[AND_PHI]], [[ADD_5]]
	; NPM-NEXT: [[ADD_7]] = add i64 [[AND_PHI]], [[ADD_6]]
	; NPM-NEXT: [[NITER_NSUB_7]] = add i64 [[NITER]], -8			; NPM-NEXT: [[NITER_NSUB_7]] = add i64 [[NITER]], -8
	; NPM-NEXT: [[NITER_NCMP_7:%.*]] = icmp eq i64 [[NITER_NSUB_7]], 0			; NPM-NEXT: [[NITER_NCMP_7:%.*]] = icmp eq i64 [[NITER_NSUB_7]], 0
	; NPM-NEXT: br i1 [[NITER_NCMP_7]], label [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]			; NPM-NEXT: br i1 [[NITER_NCMP_7]], label [[FOR_COND_CLEANUP_LOOPEXIT_UNR_LCSSA]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]
	; NPM: for.body.for.body_crit_edge:			; NPM: for.body.for.body_crit_edge:
	; NPM-NEXT: [[AND_1]] = and i64 [[CONV]], [[AND_PHI]]			; NPM-NEXT: [[AND_1]] = and i64 [[AND_PHI]], [[CONV]]
	; NPM-NEXT: br label [[FOR_BODY]]			; NPM-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%blah.addr = alloca i64, align 8			%blah.addr = alloca i64, align 8
	%limit.addr = alloca i64, align 8			%limit.addr = alloca i64, align 8
	%k = alloca i32, align 4			%k = alloca i32, align 4
	%g = alloca i64, align 8			%g = alloca i64, align 8
	%i = alloca i64, align 8			%i = alloca i64, align 8
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Pass Pipeline] Run another round of reassociation after loop pipelineAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 199355

lib/Passes/PassBuilder.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

test/CodeGen/AMDGPU/simplify-libcalls.ll

test/Other/new-pm-defaults.ll

test/Other/new-pm-thinlto-defaults.ll

test/Other/opt-O2-pipeline.ll

test/Other/opt-O3-pipeline.ll

test/Other/opt-Os-pipeline.ll

test/Transforms/PhaseOrdering/reassociate-after-unroll.ll

[Pass Pipeline] Run another round of reassociation after loop pipeline
AbandonedPublic