This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
2/6
opt-pipeline.ll
-
Other/
1/5
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
new-pm-thinlto-prelink-pgo-defaults.ll
-
new-pm-thinlto-prelink-samplepgo-defaults.ll

Differential D109958

[LoopFlatten] Enable it by default
ClosedPublic

Authored by SjoerdMeijer on Sep 17 2021, 3:31 AM.

Download Raw Diff

Details

Reviewers

dmgreen
nikic
fhahn
lebedev.ri
alanphipps
alexey.zhikhar
efriedma
Whitney
bmahjour
xbolva00

Commits

rG233659c7ae9b: [LoopFlatten] Enable it by default

Summary

LoopFlatten improves a well known embedded benchmark with highly-popular industry applications with a few percentage points. But it is not restricted to just optimise a single benchmark case. Find below results for the llvm test suite and the number of loops it flattened:

Test                                                                       # Loops flattened
--------------------------------------------------------------------------------------------
MultiSource/Applications/JM/lencod/lencod                                  3
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg                       1
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg                 3
MultiSource/Applications/JM/ldecod/ldecod                                  1       
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000                         3       
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4                              17      
SingleSource/Benchmarks/Misc/himenobmtxpa                                  2       
MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion  2
MicroBenchmarks/ImageProcessing/BilateralFiltering/BilateralFilter         2
MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG                    20
MultiSource/Benchmarks/Rodinia/pathfinder/pathfinder                       1
MicroBenchmarks/ImageProcessing/Blur/blur                                  2
MicroBenchmarks/ImageProcessing/Dither/Dither                              2
MicroBenchmarks/ImageProcessing/Dilate/Dilate                              2
MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG                       1
MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC                 1
MicroBenchmarks/ImageProcessing/Interpolation/Interpolation                2
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk                             2
MultiSource/Benchmarks/Rodinia/backprop/backprop                           1
----------------------------------------------------------------------------------------------
Total                                                                     68

While the implementation of LoopFlatten recognises a few patterns and could be made more generic, I believe these numbers show that it's generic enough to trigger on a wide variety of code bases, making it worthwile to enable it by default.

LoopFlatten is a relatively simple pass, it e.g. doesn't implement a computationally expensive algorithm, and doesn't require more analysis than a
typical loop pass. Compile-times for the llvm test suite (ClamAV, 7zip, tramp3d-v4, kimwitu++, sqlite3, mafft, SPASS, lencod, Bullet) show a very minor increase of ~0.04% to 0.28%. There are cases that improve compile times, but I haven't analysed that and don't want to claim of course that in general it will improve compile-times.

We have LoopFlatten enable by default downstream for many years now, thus it should have had a lot of exposure and usage and we are not aware of any problems.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Sep 17 2021, 3:31 AM

Herald added subscribers: ormris, wenlei, steven_wu, hiraditya. · View Herald TranscriptSep 17 2021, 3:31 AM

SjoerdMeijer requested review of this revision.Sep 17 2021, 3:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2021, 3:31 AM

SjoerdMeijer added a subscriber: RosieSumpter.Sep 17 2021, 3:34 AM

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=0fc624f029f568e91caf74d90abc5d8d971151c2&to=60d33cf095caea659c46e6a026ecf8899b3d60be&stat=instructions

I wasn't able to get numbers for the legacy PM, because this seems to either cause or expose a crash in IndVarSimplify on z51.c from consumer-typeset: https://llvm-compile-time-tracker.com/show_error.php?commit=60d33cf095caea659c46e6a026ecf8899b3d60be That should be resolved before landing this.

llvm/test/Other/new-pm-defaults.ll
168	As a drive-by comment for @asbirlea and @aeubanks, it looks like there's an opportunity here to not rerun LoopSimplify/LCSSA if we run multiple Loop/LoopNest pass managers back to back?

Harbormaster completed remote builds in B124362: Diff 373180.Sep 17 2021, 4:09 AM

In D109958#3005976, @nikic wrote:

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=0fc624f029f568e91caf74d90abc5d8d971151c2&to=60d33cf095caea659c46e6a026ecf8899b3d60be&stat=instructions

I wasn't able to get numbers for the legacy PM, because this seems to either cause or expose a crash in IndVarSimplify on z51.c from consumer-typeset: https://llvm-compile-time-tracker.com/show_error.php?commit=60d33cf095caea659c46e6a026ecf8899b3d60be That should be resolved before landing this.

Many thanks for running and confirming those numbers @nikic!
And yes, I will have a look to see what's going on with that crash.

xbolva00 added a subscriber: xbolva00.Sep 17 2021, 5:11 AM

Nice! I think this is a good thing.

I have checked this on some code i care about and as far as i can tell
it pretty much always fails with Loop::isCanonical(), e.g. because
the loops produced by OpenMP lowering are non-canonical by definition.
This is probably not a blocker, but it does mean that the true impact
of this change (aka, does it regress cases and require costmodel tuning?)
will not be apparent until later.

In D109958#3006066, @lebedev.ri wrote:

Nice! I think this is a good thing.

I have checked this on some code i care about and as far as i can tell
it pretty much always fails with Loop::isCanonical(), e.g. because
the loops produced by OpenMP lowering are non-canonical by definition.
This is probably not a blocker, but it does mean that the true impact
of this change (aka, does it regress cases and require costmodel tuning?)
will not be apparent until later.

Yeah, by design it is fairly specific in the cases it supports (but still triggers a lot). For example, PR40581 is about a case we don't yet support. If you have a different case, I would be happy if you can raise a PR, then I can have a look after this lands.

aeubanks added inline comments.Sep 17 2021, 9:20 AM

llvm/test/Other/new-pm-defaults.ll
168	we need to fix where we add `LoopFlattenPass` in PassBuilderPipelines.cpp instead of if (EnableLoopFlatten) FPM.addPass(createFunctionToLoopPassAdaptor(LoopFlattenPass())); we should move it earlier to if (EnableLoopFlatten) LPM2.addPass(LoopFlattenPass());

It would be good to have some performance testing for this too.

asbirlea added inline comments.Sep 17 2021, 12:41 PM

llvm/test/Other/new-pm-defaults.ll
168	+1 it should be added to a LPM.

nikic added inline comments.Sep 17 2021, 2:12 PM

llvm/test/Other/new-pm-defaults.ll
168	Oh, I wasn't aware that loop passes and loop nest passes can run in the same pass manager. That makes more sense indeed.

fwiw, I'm seeing a crash during bootstrap build, so this needs more testing for sure.

SjoerdMeijer added inline comments.Sep 20 2021, 12:19 AM

llvm/test/Other/new-pm-defaults.ll
168	Thanks for commenting on this and the suggestions! LoopFlatten was intentionally added to where it currently lives for the LPM. LoopFlatten removes an inner-loop, and a loop pass was not able to deal with that very well under the LPM. But this is probably out of date for the NPM, and LoopFlatten was made a LoopNest pass in D102904, so I guess that means we can just add to where you suggested.

SjoerdMeijer mentioned this in D110057: [LoopFlatten] Move it to a LoopPassManager.Sep 20 2021, 3:26 AM

I am addressing the crash in D110234.

With the bootstrap failure fixed in D110234 and another recently raised issue D110712, and having tested this more, I would like to pick this up again.

It would be good to have some performance testing for this too.

Like I mentioned in the description, this gives a really good improvement on an embedded benchmark, but is generic enough to trigger a lot in for example the llvm test suite (and other). Because LoopFlatten removes an inner-loop, it is unlikely LoopFlatten makes things worse, and should be a case of "it should give the same or better performance". Supporting this with some data:

Test	# flattened loops	% diff
MultiSource/Applications/JM/lencod/lencod.test	3	-0.28
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test	1	-9.04
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test	3	-5.75
MultiSource/Applications/JM/ldecod/ldecod.test	1	0.97
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test	3	0.29
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test	17	-0.37
SingleSource/Benchmarks/Misc/himenobmtxpa.test	2	0.09
MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/32	2	-1.27
MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/64	2	-0.47
MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/128	2	-0.24
MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/256	2	-0.18
MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test	20	-0.21
MultiSource/Benchmarks/Rodinia/pathfinder/pathfinder.test	1	-0.84
MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test	1	0.50
MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test	1	0.11
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test	2	-1.39
MultiSource/Benchmarks/Rodinia/backprop/backprop.test	1	-0.45

Negative numbers are reductions in exec times, so is better.
Take these numbers this with a little bit of salt because the test suite can be a bit noisy. But like I said, I think the take away message is that LoopFlatten is a nice simplification doing some good here and there (I actually haven't paid attention to it, but should help code-size a bit too I guess).

What do we think of this?

Try to reland this?

AMDGPU test fails? Please fix.

In D109958#3047849, @xbolva00 wrote:

Try to reland this?

Thanks for looking again at this. This was never committed, so I am looking for an LGTM to do so. :)
But perhaps I didn't understand.

In D109958#3047850, @xbolva00 wrote:

AMDGPU test fails? Please fix.

Ah yeah, thanks. My mistake, I didn't build with the amdgpu backend enabled, so didn't catch this. Will fix.

Fixed the amdgpu test.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptOct 7 2021, 7:45 AM

Harbormaster completed remote builds in B127523: Diff 377848.Oct 7 2021, 8:34 AM

Can you also update release notes?

Updated the ReleaseNotes.

Many thanks @nikic and @aeubanks for sorting out the "preserved analysis" side of things here, really appreciate it! With this fixed, i.e. D111350 and D111328, are we happy for this go in too?

nikic added inline comments.Oct 11 2021, 1:24 AM

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
164	I'm somewhat confused by what is going on here. Why do we now calculate MemorySSA and why does LoopUnroll get split into a separate LPM?

Harbormaster completed remote builds in B128051: Diff 378586.Oct 11 2021, 1:58 AM

SjoerdMeijer added inline comments.Oct 11 2021, 2:26 AM

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
164	I have no idea and accepted this as something the pass manager decided to do.... I am also confused about both things: I have no idea why we need to rerun Memory SSA, and don't see why LoopUnroll is now run separately. I will look into this, see if I can get any wiser here....

SjoerdMeijer added inline comments.Oct 11 2021, 6:06 AM

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

164

In lib/Passes/PassBuilderPipelines.cpp we have this:

if (EnableLoopFlatten)
  FPM.addPass(createFunctionToLoopPassAdaptor(LoopFlattenPass()));
// The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
// *All* loop passes must preserve it, in order to be able to use it.
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
                                            /*UseMemorySSA=*/false,
                                            /*UseBlockFrequencyInfo=*/false));

Since this is talking about MemorySSA this might be related, but there are so many things going on here and I am still looking, so don't know for certain. If e.g. @aeubanks has some tips or suggestions here, I would be happy to receive them. :)

aeubanks added inline comments.Oct 11 2021, 10:31 AM

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
164	This is a legacy PM test. I'm not super familiar with the legacy PM, but it's probably something to do with the fact that the legacy loop flatten pass is a function pass and perhaps doesn't preserve some analyses? Could we perhaps just make this change for the new PM?

SjoerdMeijer added inline comments.Oct 11 2021, 11:10 AM

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
164	I'm not super familiar with the legacy PM, but it's probably something to do with the fact that the legacy loop flatten pass is a function pass and perhaps doesn't preserve some analyses? Yep, that's what I thought too, but didn't get to the bottom of it and can't "prove" it. Could we perhaps just make this change for the new PM? Thanks for the suggestion, I am happy to enable LoopFlatten by default only for the new PM, so will look into that. Diverging is probably not ideal, but since the legacy PM is deprecated I am happy to do that. The alternative is to "accept" that this is what the legacy PM does, but looks like we are not happy with that, so again, will do this only for the new PM.

Thanks for the suggestion, I am happy to enable LoopFlatten by default only for the new PM, so will look into that.

My team presently relies on loop flattening with a downstream compiler, and we presently use the LPM and will move to the NPM after the 14.x branch. It would be great if this functionality didn't diverge between the two, although I understand the frustration. At the very least, I would like to understand what is different, and we could perhaps enable by default downstream until we migrate to the NPM.

In D109958#3055809, @alanphipps wrote:

Thanks for the suggestion, I am happy to enable LoopFlatten by default only for the new PM, so will look into that.

My team presently relies on loop flattening with a downstream compiler, and we presently use the LPM and will move to the NPM after the 14.x branch. It would be great if this functionality didn't diverge between the two, although I understand the frustration. At the very least, I would like to understand what is different, and we could perhaps enable by default downstream until we migrate to the NPM.

There are already a good number of divergences between the two pass managers, I don't see the harm in keeping the LPM pipeline the same. Do you specifically care about this pass?

The changes to the test show that this would affect execution order of passes which could actually affect optimizations. People have requested that the LPM not regress even though it's deprecated, and I'm hesitant to touch LPM pass execution order.
I'd say that somebody cares about this pass and is using the LPM, they can investigate the weirdness and turn it on in a separate change for the LPM.

There are already a good number of divergences between the two pass managers, I don't see the harm in keeping the LPM pipeline the same. Do you specifically care about this pass?

Yes, we asked about loop flattening a year ago along with a jump threading improvement in the context of improving the Coremark, and @SjoerdMeijer indicated that there was a pending code review. I can appreciate that there are already divergences between the two pass managers. If there is a possibility of regression on the LPM in this case, then I can understand not enabling it by default. We can explore enabling it downstream, but any other information about impact to the LPM would be helpful and appreciated. Thanks!

In D109958#3055878, @alanphipps wrote:

There are already a good number of divergences between the two pass managers, I don't see the harm in keeping the LPM pipeline the same. Do you specifically care about this pass?

Yes, we asked about loop flattening a year ago along with a jump threading improvement in the context of improving the Coremark, and @SjoerdMeijer indicated that there was a pending code review. I can appreciate that there are already divergences between the two pass managers. If there is a possibility of regression on the LPM in this case, then I can understand not enabling it by default. We can explore enabling it downstream, but any other information about impact to the LPM would be helpful and appreciated. Thanks!

We could make the flag only affect the LPM and forcibly enable loop flatten in the NPM without a flag to turn it off. That way it'd be easier for downstream users to evaluate it with the LPM.

I am not sure I entirely follow, @alanphipps . We also rely on this pass, and for many years we have been using this with the LPM, and since the switch we are now using it under the NPM. This helps a benchmark case (and the other things I mentioned), and both under the LPM and the NPM this will trigger, so I am not sure what regressions we are talking about it. Also, if you're using this downstream, like us, then I assume you have a downstream change to enable it by default, just like us. So if we are going to enable this by default under the NPM, I am not sure if anything will change for you. But either way, I am happy to make this change:

We could make the flag only affect the LPM and forcibly enable loop flatten in the NPM without a flag to turn it off. That way it'd be easier for downstream users to evaluate it with the LPM.

thanks for the suggestion.

asbirlea mentioned this in D111578: [LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available..Oct 11 2021, 1:56 PM

asbirlea added inline comments.Oct 11 2021, 1:58 PM

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
164	The reason for this change in pipeline is due to where LoopFlatten is added in legacy pass manager , along with LoopSimplifyCFG (llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:472) // We resume loop passes creating a second loop pipeline here. if (EnableLoopFlatten) { MPM.add(createLoopFlattenPass()); // Flatten loops MPM.add(createLoopSimplifyCFGPass()); } It looks like LoopSimplifyCFG requires MemorySSA, which explains why that is being computed and why the loop pipeline is split (LoopUnroll does not preserve MemorySSA do it's getting split out) I don't think LoopSimplifyCFG should require MemorySSA, just preserve it if available. D111578 should simplify these tests.

In D109958#3056240, @SjoerdMeijer wrote:

I assume you have a downstream change to enable it by default, just like us. So if we are going to enable this by default under the NPM, I am not sure if anything will change for you

Yes, fair enough -- rereading the discussion, I was concerned that the MemorySSA issue for the LPM wasn't well understood and perhaps something was being overlooked in the implementation, but it sounds like the concern is more pertinent to performance impact/fluctuation on the LPM and not the implementation itself, which is complete. I think we are good enabling it downstream.

asbirlea mentioned this in rGf7ca54289c14: [LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available..Oct 11 2021, 2:28 PM

Reping.

Enable only for npm? lpm is dead and will be removed anyway.

Marking this as blocked on D110057 for now.

This revision now requires changes to proceed.Dec 29 2021, 9:39 AM

Thanks for the ping. I really do want to finish this, but need to find some courage to deal with the pass manager business in D110057. ;-) But am going to take a look now.

SjoerdMeijer mentioned this in rG86825fc2fb36: [LoopFlatten] Move it to a LoopPassManager.Dec 30 2021, 4:32 AM

With D110057 committed, is this now good to go too?

(I will check the test changes here after landing D110057, haven’t done that yet)

New compile-time: https://llvm-compile-time-tracker.com/compare.php?from=6f45fe9851c673883b3a258351ee4997aa2c028c&to=439d1c9613828db24ad03eb415baad2e76b8913c&stat=instructions Looks ok to me.

We may want to drop the LoopSimplifyCFG run from the LegacyPM pipeline, as that doesn't match NewPM and seems to account for the actual codegen impact of this change (LoopFlatten itself doesn't seem to be doing much in practice).

Update of the tests after D110057.

In D109958#3214802, @nikic wrote:

New compile-time: https://llvm-compile-time-tracker.com/compare.php?from=6f45fe9851c673883b3a258351ee4997aa2c028c&to=439d1c9613828db24ad03eb415baad2e76b8913c&stat=instructions Looks ok to me.

Nice, thank you for sharing!
That indeed looks good I think.

We may want to drop the LoopSimplifyCFG run from the LegacyPM pipeline, as that doesn't match NewPM and seems to account for the actual codegen impact of this change (LoopFlatten itself doesn't seem to be doing much in practice).

I will look into that. I think I can do that separately.

Harbormaster completed remote builds in B141095: Diff 396770.Dec 31 2021, 4:07 AM

I think all points have been addressed, so am looking for an LGTM .... anyone interested? :-)

Matt added a subscriber: Matt.Jan 5 2022, 3:21 PM

Ping

With your help, we have fixed quite a few things, like updating the MemorySSA state and moving it a LoopManager (and fixing a performance regression related to that).

With these things fixed, are we happy to enable LoopFlatten by default?

Thank you for resolving all the issues raised! Could you rebase this patch on ToT? (to include the MSSA update and the move to LPM1?)

I'm planning to run a performance check on this, to see what the expectation is for enabling LoopFlatten by default. Kindly allow this week for me to do the testing.

Thank you for resolving all the issues raised! Could you rebase this patch on ToT? (to include the MSSA update and the move to LPM1?)

Done.

I'm planning to run a performance check on this, to see what the expectation is for enabling LoopFlatten by default. Kindly allow this week for me to do the testing.

Okay, excellent, thank you very much for this @asbirlea!

Harbormaster completed remote builds in B144513: Diff 401547.Jan 20 2022, 2:57 AM

ormris removed a subscriber: ormris.Jan 24 2022, 11:10 AM

Quick update: I'm not seeing any major run time regressions, but there are a few unrelated issues that may make the results I'm seeing incomplete.
I'm seeing some compile-time regressions, I'd be curious to see what the updated compiler-tracker results look like.

Hi @asbirlea, thanks for checking, much appreciated! About:

I'm seeing some compile-time regressions, I'd be curious to see what the updated compiler-tracker results look like.

Nikic ran compile time numbers, and they should still be the relevant/representative numbers as I don't think anything (relevant) has changed since then:

In D109958#3214802, @nikic wrote:

New compile-time: https://llvm-compile-time-tracker.com/compare.php?from=6f45fe9851c673883b3a258351ee4997aa2c028c&to=439d1c9613828db24ad03eb415baad2e76b8913c&stat=instructions Looks ok to me.

We may want to drop the LoopSimplifyCFG run from the LegacyPM pipeline, as that doesn't match NewPM and seems to account for the actual codegen impact of this change (LoopFlatten itself doesn't seem to be doing much in practice).

But I will double check.

Fresh numbers: https://llvm-compile-time-tracker.com/compare.php?from=0776f6e04d8c3823c31b8fa6fa7d1376a7009af1&to=9522ad62a3f461bc989d2cca3c2bf09b6dab4632&stat=instructions

I am a bit surprised it looks like to be a bit higher than @nikic last measurements as I don't think anything functionally has changed (https://llvm-compile-time-tracker.com/compare.php?from=6f45fe9851c673883b3a258351ee4997aa2c028c&to=439d1c9613828db24ad03eb415baad2e76b8913c&stat=instructions).

We made it a LPM and it is preserving MSSA since Nikita's measurements, but again, was not expecting this to change things. Will do some more experiments.

Gentle ping. Any further thoughts on this?

Any interest in landing it?

Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 9:47 PM

Herald added a subscriber: kosarev. · View Herald Transcript

I think we should land it.
Not repeating everything (see previous messages), but there are no disadvantages, and if it triggers it should be a win.

I was still looking for a LGTM, so interested in doing that @hiraditya? I can e.g. wait a week after the LGTM to see if there are any objections.

Lets try it. If there are any problems, there is flag to disable this pass or revert this patch...

@nikic any comments from you?

I am waiting for internal approval to commit to llvm. Once I got that, I plan to commit this.

I don't have an issue with committing so it gets wider testing. It can be reverted if issues do pop up.

I think clang/test/Profile/misexpect-switch-only-default-case.c may get a failure with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON which enables verify-scev if this pass is enabled by default.

I think the pass needs to call SE->forgetLoopDispositions() when it flattens loops. I don't know exactly what LoopDispositions are in SCEV so hopefully someone else can take a look.

eopXD added a subscriber: eopXD.Oct 6 2022, 2:05 AM

In D109958#3839132, @craig.topper wrote:

I think clang/test/Profile/misexpect-switch-only-default-case.c may get a failure with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON which enables verify-scev if this pass is enabled by default.

I think the pass needs to call SE->forgetLoopDispositions() when it flattens loops. I don't know exactly what LoopDispositions are in SCEV so hopefully someone else can take a look.

Thanks for catching and reporting this!
Adding SE->forgetLoopDispositions() indeed fixes that test, and passes initial testing, but I will do some more testing (with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON).

This revision was not accepted when it landed; it landed in state Needs Review.Oct 17 2022, 4:42 AM

Closed by commit rG233659c7ae9b: [LoopFlatten] Enable it by default (authored by SjoerdMeijer). · Explain Why

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG233659c7ae9b: [LoopFlatten] Enable it by default.

I have committed this, including Craig's suggestion. Let's see how it goes...

Thanks for all your help with this patch (and the previous ones).

SjoerdMeijer added a reverting change: rGa71c4e4fbba3: Revert "[LoopFlatten] Enable it by default".Oct 17 2022, 9:45 AM

I looked into the unexpectedly large compile-time regressions this pass causes, and this does appear to come down to the move from LPM2 to LPM1. I believe the reason why this has such an impact is this: Prior to this patch, passes in LPM1 did not use SCEV, only passes in LPM2 did. The move of LoopFlatten introduced a SCEV use in LPM1, which would get computed just for LoopFlatten, and then discarded again. (Note that all loop passes depend on SCEV, but it's a lazy analysis, so what matters is whether SCEV actually gets queried.)

The fact that LoopFlatten does not work properly after IndVarSimplify is fairly concerning, because IndVars is an IV canonicalization pass, and other passes are supposed to deal with the IVs it produces. The fact that LoopFlatten actually does its own IV widening (which is generally the responsibility of IndVars) is further indication that something is going wrong here.

In D109958#3862780, @nikic wrote:

I looked into the unexpectedly large compile-time regressions this pass causes, and this does appear to come down to the move from LPM2 to LPM1. I believe the reason why this has such an impact is this: Prior to this patch, passes in LPM1 did not use SCEV, only passes in LPM2 did. The move of LoopFlatten introduced a SCEV use in LPM1, which would get computed just for LoopFlatten, and then discarded again. (Note that all loop passes depend on SCEV, but it's a lazy analysis, so what matters is whether SCEV actually gets queried.)

Hm, I've just recommitted this as the sanitizer bot failures seemed unrelated.

The fact that LoopFlatten does not work properly after IndVarSimplify is fairly concerning, because IndVars is an IV canonicalization pass, and other passes are supposed to deal with the IVs it produces. The fact that LoopFlatten actually does its own IV widening (which is generally the responsibility of IndVars) is further indication that something is going wrong here.

Not sure I fully agree with "does not work properly", more precise would be to say less effective. Less effective because there's less chance of it triggering. The IV widening helps or rather avoids overflow checks. If IndVars comes along first, that opportunity is gone. With the widened IVs, the legality is very difficult to determine. That's the rationale and design decision of this.

aeubanks added inline comments.Oct 17 2022, 12:40 PM

llvm/docs/ReleaseNotes.rst
15 ↗	(On Diff #468165)	this doesn't look like the right place for the notes `Non-comprehensive list of changes in this release` section looks better
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
765 ↗	(On Diff #468165)	this should probably have a proper IR test case and be committed separately (the committed separately part is moot now, but mentioned in case this gets reverted again at some point)

SjoerdMeijer added inline comments.Oct 18 2022, 11:13 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
765 ↗	(On Diff #468165)	Have just reverted it and am going to look at the miscompilation reported in: https://github.com/llvm/llvm-project/issues/58441 Will also address these comments, thanks for that.

craig.topper mentioned this in D137651: [LoopFlatten] Forget all block and loop dispositions after flatten.Nov 13 2022, 9:19 PM

Just a heads up that I am going to recommit this tomorrow.

From the previous two attempts to enable this, two miscompilations were reported: #58441 and #59339. Both have been fixed by @dmgreen
I strongly feel they were two exceptions, so with the release out of the door, now is a good time to reland this.

CC: @lebedev.ri

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 20 2023, 6:33 AM

FWIW, I am quite unhappy with the implementation quality of this pass, but I don't think I have the energy to deal with this. In the future, due diligence for pass enablement needs to include a review of the pass implementation by a domain expert, if this was not already done as part of the initial implementation. (Domain expert = SCEV reviewer in this context.)

In D109958#4206644, @nikic wrote:

FWIW, I am quite unhappy with the implementation quality of this pass, but I don't think I have the energy to deal with this. In the future, due diligence for pass enablement needs to include a review of the pass implementation by a domain expert, if this was not already done as part of the initial implementation. (Domain expert = SCEV reviewer in this context.)

I think compiler people are always a bit unhappy about compilers. ;-)
During its development, I believe this pass had reviews from people that I think qualify as domain experts.
I understand the tension here between using SCEV and some pattern matching, but I think the implementation is reasonable given assumptions/restrictions of this pass.
I am open to suggestions, so I guess the best way forward is to drop an email to the SCEV owner asking for an assessment.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

2 lines

test/

CodeGen/

AMDGPU/

opt-pipeline.ll

12 lines

Other/

new-pm-defaults.ll

3 lines

new-pm-lto-defaults.ll

3 lines

new-pm-thinlto-defaults.ll

3 lines

new-pm-thinlto-postlink-pgo-defaults.ll

3 lines

new-pm-thinlto-postlink-samplepgo-defaults.ll

3 lines

new-pm-thinlto-prelink-pgo-defaults.ll

3 lines

new-pm-thinlto-prelink-samplepgo-defaults.ll

3 lines

Diff 377848

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	cl::opt<bool> EnableLoopInterchange(			cl::opt<bool> EnableLoopInterchange(
	"enable-loopinterchange", cl::init(false), cl::Hidden,			"enable-loopinterchange", cl::init(false), cl::Hidden,
	cl::desc("Enable the experimental LoopInterchange Pass"));			cl::desc("Enable the experimental LoopInterchange Pass"));

	cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam", cl::init(false),			cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam", cl::init(false),
	cl::Hidden,			cl::Hidden,
	cl::desc("Enable Unroll And Jam Pass"));			cl::desc("Enable Unroll And Jam Pass"));

	cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),			cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(true),
	cl::Hidden,			cl::Hidden,
	cl::desc("Enable the LoopFlatten Pass"));			cl::desc("Enable the LoopFlatten Pass"));

	cl::opt<bool> EnableDFAJumpThreading("enable-dfa-jump-thread",			cl::opt<bool> EnableDFAJumpThreading("enable-dfa-jump-thread",
	cl::desc("Enable DFA jump threading."),			cl::desc("Enable DFA jump threading."),
	cl::init(false), cl::Hidden);			cl::init(false), cl::Hidden);

	static cl::opt<bool>			static cl::opt<bool>
	▲ Show 20 Lines • Show All 1,225 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	; GCN-O1-NEXT: Optimization Remark Emitter			; GCN-O1-NEXT: Optimization Remark Emitter
	; GCN-O1-NEXT: Combine redundant instructions			; GCN-O1-NEXT: Combine redundant instructions
	; GCN-O1-NEXT: Canonicalize natural loops			; GCN-O1-NEXT: Canonicalize natural loops
	; GCN-O1-NEXT: LCSSA Verifier			; GCN-O1-NEXT: LCSSA Verifier
	; GCN-O1-NEXT: Loop-Closed SSA Form Pass			; GCN-O1-NEXT: Loop-Closed SSA Form Pass
	; GCN-O1-NEXT: Scalar Evolution Analysis			; GCN-O1-NEXT: Scalar Evolution Analysis
				; GCN-O1-NEXT: Flattens loops
				; GCN-O1-NEXT: Memory SSA
				nikicUnsubmitted Not Done Reply Inline Actions I'm somewhat confused by what is going on here. Why do we now calculate MemorySSA and why does LoopUnroll get split into a separate LPM? nikic: I'm somewhat confused by what is going on here. Why do we now calculate MemorySSA and why does…
				SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions I have no idea and accepted this as something the pass manager decided to do.... I am also confused about both things: I have no idea why we need to rerun Memory SSA, and don't see why LoopUnroll is now run separately. I will look into this, see if I can get any wiser here.... SjoerdMeijer: I have no idea and accepted this as something the pass manager decided to do.... I am also…
				SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions In `lib/Passes/PassBuilderPipelines.cpp` we have this: if (EnableLoopFlatten) FPM.addPass(createFunctionToLoopPassAdaptor(LoopFlattenPass())); // The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA. // All loop passes must preserve it, in order to be able to use it. FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2), /UseMemorySSA=/false, /UseBlockFrequencyInfo=/false)); Since this is talking about MemorySSA this might be related, but there are so many things going on here and I am still looking, so don't know for certain. If e.g. @aeubanks has some tips or suggestions here, I would be happy to receive them. :) SjoerdMeijer: In `lib/Passes/PassBuilderPipelines.cpp` we have this: if (EnableLoopFlatten) FPM.
				aeubanksUnsubmitted Not Done Reply Inline Actions This is a legacy PM test. I'm not super familiar with the legacy PM, but it's probably something to do with the fact that the legacy loop flatten pass is a function pass and perhaps doesn't preserve some analyses? Could we perhaps just make this change for the new PM? aeubanks: This is a legacy PM test. I'm not super familiar with the legacy PM, but it's probably…
				SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I'm not super familiar with the legacy PM, but it's probably something to do with the fact that the legacy loop flatten pass is a function pass and perhaps doesn't preserve some analyses? Yep, that's what I thought too, but didn't get to the bottom of it and can't "prove" it. Could we perhaps just make this change for the new PM? Thanks for the suggestion, I am happy to enable LoopFlatten by default only for the new PM, so will look into that. Diverging is probably not ideal, but since the legacy PM is deprecated I am happy to do that. The alternative is to "accept" that this is what the legacy PM does, but looks like we are not happy with that, so again, will do this only for the new PM. SjoerdMeijer: > I'm not super familiar with the legacy PM, but it's probably something to do with the fact…
				asbirleaUnsubmitted Not Done Reply Inline Actions The reason for this change in pipeline is due to where LoopFlatten is added in legacy pass manager , along with LoopSimplifyCFG (llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:472) // We resume loop passes creating a second loop pipeline here. if (EnableLoopFlatten) { MPM.add(createLoopFlattenPass()); // Flatten loops MPM.add(createLoopSimplifyCFGPass()); } It looks like LoopSimplifyCFG requires MemorySSA, which explains why that is being computed and why the loop pipeline is split (LoopUnroll does not preserve MemorySSA do it's getting split out) I don't think LoopSimplifyCFG should require MemorySSA, just preserve it if available. D111578 should simplify these tests. asbirlea: The reason for this change in pipeline is due to where LoopFlatten is added in legacy pass…
	; GCN-O1-NEXT: Loop Pass Manager			; GCN-O1-NEXT: Loop Pass Manager
				; GCN-O1-NEXT: Simplify loop CFG
	; GCN-O1-NEXT: Recognize loop idioms			; GCN-O1-NEXT: Recognize loop idioms
	; GCN-O1-NEXT: Induction Variable Simplification			; GCN-O1-NEXT: Induction Variable Simplification
	; GCN-O1-NEXT: Delete dead loops			; GCN-O1-NEXT: Delete dead loops
				; GCN-O1-NEXT: Loop Pass Manager
	; GCN-O1-NEXT: Unroll loops			; GCN-O1-NEXT: Unroll loops
	; GCN-O1-NEXT: SROA			; GCN-O1-NEXT: SROA
	; GCN-O1-NEXT: Sparse Conditional Constant Propagation			; GCN-O1-NEXT: Sparse Conditional Constant Propagation
	; GCN-O1-NEXT: Demanded bits analysis			; GCN-O1-NEXT: Demanded bits analysis
	; GCN-O1-NEXT: Bit-Tracking Dead Code Elimination			; GCN-O1-NEXT: Bit-Tracking Dead Code Elimination
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Lazy Branch Probability Analysis			; GCN-O1-NEXT: Lazy Branch Probability Analysis
	; GCN-O1-NEXT: Lazy Block Frequency Analysis			; GCN-O1-NEXT: Lazy Block Frequency Analysis
	▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Optimization Remark Emitter			; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Combine redundant instructions			; GCN-O2-NEXT: Combine redundant instructions
	; GCN-O2-NEXT: Canonicalize natural loops			; GCN-O2-NEXT: Canonicalize natural loops
	; GCN-O2-NEXT: LCSSA Verifier			; GCN-O2-NEXT: LCSSA Verifier
	; GCN-O2-NEXT: Loop-Closed SSA Form Pass			; GCN-O2-NEXT: Loop-Closed SSA Form Pass
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
				; GCN-O2-NEXT: Flattens loops
				; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
				; GCN-O2-NEXT: Simplify loop CFG
	; GCN-O2-NEXT: Recognize loop idioms			; GCN-O2-NEXT: Recognize loop idioms
	; GCN-O2-NEXT: Induction Variable Simplification			; GCN-O2-NEXT: Induction Variable Simplification
	; GCN-O2-NEXT: Delete dead loops			; GCN-O2-NEXT: Delete dead loops
				; GCN-O2-NEXT: Loop Pass Manager
	; GCN-O2-NEXT: Unroll loops			; GCN-O2-NEXT: Unroll loops
	; GCN-O2-NEXT: SROA			; GCN-O2-NEXT: SROA
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: MergedLoadStoreMotion			; GCN-O2-NEXT: MergedLoadStoreMotion
	; GCN-O2-NEXT: Phi Values Analysis			; GCN-O2-NEXT: Phi Values Analysis
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Memory Dependence Analysis			; GCN-O2-NEXT: Memory Dependence Analysis
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	▲ Show 20 Lines • Show All 339 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Combine redundant instructions			; GCN-O3-NEXT: Combine redundant instructions
	; GCN-O3-NEXT: Canonicalize natural loops			; GCN-O3-NEXT: Canonicalize natural loops
	; GCN-O3-NEXT: LCSSA Verifier			; GCN-O3-NEXT: LCSSA Verifier
	; GCN-O3-NEXT: Loop-Closed SSA Form Pass			; GCN-O3-NEXT: Loop-Closed SSA Form Pass
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
				; GCN-O3-NEXT: Flattens loops
				; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
				; GCN-O3-NEXT: Simplify loop CFG
	; GCN-O3-NEXT: Recognize loop idioms			; GCN-O3-NEXT: Recognize loop idioms
	; GCN-O3-NEXT: Induction Variable Simplification			; GCN-O3-NEXT: Induction Variable Simplification
	; GCN-O3-NEXT: Delete dead loops			; GCN-O3-NEXT: Delete dead loops
				; GCN-O3-NEXT: Loop Pass Manager
	; GCN-O3-NEXT: Unroll loops			; GCN-O3-NEXT: Unroll loops
	; GCN-O3-NEXT: SROA			; GCN-O3-NEXT: SROA
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: MergedLoadStoreMotion			; GCN-O3-NEXT: MergedLoadStoreMotion
	; GCN-O3-NEXT: Phi Values Analysis			; GCN-O3-NEXT: Phi Values Analysis
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory Dependence Analysis			; GCN-O3-NEXT: Memory Dependence Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: LoopFlatten
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
				nikicUnsubmitted Not Done Reply Inline Actions As a drive-by comment for @asbirlea and @aeubanks, it looks like there's an opportunity here to not rerun LoopSimplify/LCSSA if we run multiple Loop/LoopNest pass managers back to back? nikic: As a drive-by comment for @asbirlea and @aeubanks, it looks like there's an opportunity here to…
				aeubanksUnsubmitted Not Done Reply Inline Actions we need to fix where we add `LoopFlattenPass` in PassBuilderPipelines.cpp instead of if (EnableLoopFlatten) FPM.addPass(createFunctionToLoopPassAdaptor(LoopFlattenPass())); we should move it earlier to if (EnableLoopFlatten) LPM2.addPass(LoopFlattenPass()); aeubanks: we need to fix where we add `LoopFlattenPass` in PassBuilderPipelines.cpp instead of ``` if…
				asbirleaUnsubmitted Not Done Reply Inline Actions +1 it should be added to a LPM. asbirlea: +1 it should be added to a LPM.
				nikicUnsubmitted Not Done Reply Inline Actions Oh, I wasn't aware that loop passes and loop nest passes can run in the same pass manager. That makes more sense indeed. nikic: Oh, I wasn't aware that loop passes and loop nest passes can run in the same pass manager. That…
				SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Thanks for commenting on this and the suggestions! LoopFlatten was intentionally added to where it currently lives for the LPM. LoopFlatten removes an inner-loop, and a loop pass was not able to deal with that very well under the LPM. But this is probably out of date for the NPM, and LoopFlatten was made a LoopNest pass in D102904, so I guess that means we can just add to where you suggested. SjoerdMeijer: Thanks for commenting on this and the suggestions! LoopFlatten was intentionally added to where…
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-EP-LOOP-LATE-NEXT: Running pass: NoOpLoopPass			; CHECK-EP-LOOP-LATE-NEXT: Running pass: NoOpLoopPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-EP-LOOP-END-NEXT: Running pass: NoOpLoopPass			; CHECK-EP-LOOP-END-NEXT: Running pass: NoOpLoopPass
	; CHECK-O-NEXT: Running pass: SROA on foo			; CHECK-O-NEXT: Running pass: SROA on foo
	; CHECK-MATRIX: Running pass: VectorCombinePass			; CHECK-MATRIX: Running pass: VectorCombinePass
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: LICMPass on Loop			; CHECK-O23SZ-NEXT: Running pass: LICMPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: GVN on foo			; CHECK-O23SZ-NEXT: Running pass: GVN on foo
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: PhiValuesAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass on foo			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass on foo
	; CHECK-O23SZ-NEXT: Running pass: DSEPass on foo			; CHECK-O23SZ-NEXT: Running pass: DSEPass on foo
	; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo
				; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
				; CHECK-O23SZ-NEXT: Running pass: LoopFlatten
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on Loop			; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on Loop			; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on Loop			; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: LoopFlatten
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROA on foo			; CHECK-O-NEXT: Running pass: SROA on foo
	; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-Os-NEXT: Running pass: GVN			; CHECK-Os-NEXT: Running pass: GVN
	; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: LoopFlatten
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROA on foo			; CHECK-O-NEXT: Running pass: SROA on foo
	; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-Os-NEXT: Running pass: GVN			; CHECK-Os-NEXT: Running pass: GVN
	; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis
	▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: LoopFlatten
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROA on foo			; CHECK-O-NEXT: Running pass: SROA on foo
	; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-Os-NEXT: Running pass: GVN			; CHECK-Os-NEXT: Running pass: GVN
	; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

	Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: LoopFlatten
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROA on foo			; CHECK-O-NEXT: Running pass: SROA on foo
	; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-Os-NEXT: Running pass: GVN			; CHECK-Os-NEXT: Running pass: GVN
	; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LICM			; CHECK-O-NEXT: Running pass: LICM
	; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass			; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
				; CHECK-O-NEXT: Running pass: LoopFlatten
				; CHECK-O-NEXT: Running pass: LoopSimplifyPass
				; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass			; CHECK-O-NEXT: Running pass: LoopIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: IndVarSimplifyPass			; CHECK-O-NEXT: Running pass: IndVarSimplifyPass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: SROA on foo			; CHECK-O-NEXT: Running pass: SROA on foo
	; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-Os-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-Os-NEXT: Running pass: GVN			; CHECK-Os-NEXT: Running pass: GVN
	; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-Os-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-Os-NEXT: Running analysis: PhiValuesAnalysis			; CHECK-Os-NEXT: Running analysis: PhiValuesAnalysis
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopFlatten] Enable it by defaultClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 377848

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-lto-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll

[LoopFlatten] Enable it by default
ClosedPublic