This is an archive of the discontinued LLVM Phabricator instance.

[Refactor] Replace RegionPasses by FunctionPasses
AbandonedPublic

Authored by jdoerfert on Mar 1 2015, 10:55 AM.

Details

Summary
The main change is the switch to function instead of region passes.
This will save compile time as we do not have to query the
TempScopInfo for each region anymore.  However, due to the changed
interface and the now explicit iteration over regions in a function
other adjustments were made too.

Diff Detail

Event Timeline

jdoerfert updated this revision to Diff 20961.Mar 1 2015, 10:55 AM
jdoerfert retitled this revision from to [Refactor] Replace RegionPasses by FunctionPasses.
jdoerfert updated this object.
jdoerfert added subscribers: Restricted Project, Unknown Object (MLST).
jdoerfert added inline comments.Mar 1 2015, 11:07 AM
lib/Analysis/ScopInfo.cpp
2014–2015

A Scops.pop_back() is missing here.

2024–2025

This line needs to be removed in order to loop again.

jdoerfert updated this revision to Diff 20963.Mar 1 2015, 11:17 AM

Small fixes

jdoerfert updated this revision to Diff 20974.Mar 1 2015, 7:38 PM

Allow multiple SCoPs in ScopPasses + fix tests

I will cut this commit into pieces (as far as possible) but I wanted to get some initial feedback.

grosser edited edge metadata.Mar 3 2015, 12:39 AM

Hi Johannes,

what is the motivation and the impact of this change? The only motivation of this change you give in the commit message is compile time. Did you measure any significant improvements here? I personally think there are very good reasons to change/improve our pass infrastructure, but am surprised compile-time is the one and only that motivates your change.

I see two main reasons to work on the pass infrastructure:

  1. Make it work with Chandler's new pass manager

Chandler's new pass manager uses caching to keep multiple analysis results. I believe when we perform changes to our pass infrastructure, we should try to make sure it will work with Chandler's new pass manager.
Besides his last PassManager talk, commits such as https://llvm.org/svn/llvm-project/llvm/trunk@226560 show the idea of using caching results.

  1. Fix the LoopInfo/RegionInfo misconception

We currently assume ScopPasses on different regions do not affect each other. However, they indeed
affect each other, which means code-generating one scop may invalidate the next scop. Hence, we have some hacks in place to detect this invalidated scops. The (implicit) pass order change your patch brings does not seem to improve this in any way.

I am not saying your patch should solve those two issues, but maybe it is a good time to reason about this and at least understand if they will not negatively affect these changes (or possibly even help to solve some).

I added a couple of first comments to your patch, but before going further I would like to understand your intentions.

Cheers,
Tobias

include/polly/DependenceInfo.h
1 ↗(On Diff #20974)

Needs update, if renamed.

24 ↗(On Diff #20974)

Needs update, if renamed.

140 ↗(On Diff #20974)

The idea of introducing an analysis result for each individual Scop
is very much in line with Chandler's new analysis infrastructure. However, to me it seem you only went half wayby leaving most functions on the pass itself. Looking at this, introducing per-scop-dependence objects seems to be an almost independent change.

include/polly/LinkAllPasses.h
31 ↗(On Diff #20974)

I would personally not perform such renaming as part this patch, as it causes noise all over the place.

lib/Analysis/DependenceInfo.cpp
1 ↗(On Diff #20974)

Needs updated if you want to perform renaming.

351 ↗(On Diff #20974)

These are a lot of D.*? Would it make sense to make all this functionality part of the Dependences object/class.

453 ↗(On Diff #20974)

I have the feeling passing the scop to each of this functions unnecessarily complicates the interface. Could we not just once ask for the dependences object of this scop and then work with it?

Hey Tobias,

let me give you a short answer now and a more detailed one later:

  • Compile time: We perform much better on large test with a lot of regions. Here are some examples after one lnt run for a release build.

    MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 -89.42% 345.5400 36.5533 - - SingleSource/Benchmarks/Misc-C++-EH/spirit -78.71% 33.4433 7.1200 - - MultiSource/Applications/kimwitu++/kc -71.26% 77.1700 22.1766 - - MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset -67.55% 65.2370 21.1667 - - SingleSource/UnitTests/DefaultInitDynArrays -50.00% 0.0200 0.0100 - - SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -42.39% 8.6267 4.9700 - - MultiSource/Applications/sqlite3/sqlite3 -42.16% 60.4734 34.9800 - - MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 -39.79% 19.6970 11.8601 - - SingleSource/Regression/C++/EH/inlined_cleanup -38.34% 0.0433 0.0267 - - MultiSource/Applications/JM/ldecod/ldecod -37.42% 23.0533 14.4267 - - MultiSource/Applications/oggenc/oggenc -33.81% 24.6900 16.3433 - - SingleSource/Benchmarks/Stanford/Treesort -31.79% 0.0733 0.0500 - - MultiSource/Applications/lemon/lemon -30.80% 2.8467 1.9700 - - SingleSource/Benchmarks/Shootout/strcat -30.72% 0.0433 0.0300 - - MultiSource/Applications/JM/lencod/lencod -28.96% 39.8598 28.3167 - - MultiSource/Benchmarks/PAQ8p/paq8p -26.53% 4.6367 3.4067

    Note that for a debug build I more or less always time out on some of these benchmarks after 500s with the old pass system! I admit the changes (especially in the DependenceInfo pass) can be made even more efficient but that is an easy patch I can write afterwards.
  • I don't know enough about the new pass manager to say how this change will affect it.
  • SCoPs affecting each other are a misconception by design (at least in my opinion). We should make the code generation aware of that (e.g., do not split the exit block of a region at 3 different locations in Polly!). However, this change passes lnt and all unit tests and we detect exactly the same number of SCoPs, that is at least a hint that we can make function passes work. With scalar/phi code generation and without independent blocks and code prepare the impact of one SCoP on another should be even smaller.

I hope you reconsider this change, at least after I provide more accurate
numbers.

Best regards,

Johannes

Hey Tobias,

I attached 2 lnt reports to this mail. The first (report_region.json)
was created using the current polly/master. The second
(report_function.json) was created with the region->function pass patch.
Note that this patch is not yet at its full capacity, however over the 3
runs I measured I got:

Performance Regressions  16
Performance Improvements  159
Unchanged Tests   817
Total Tests   992

Where as the execution time changes (only 8 in total) should be jitter.

The point I try to make here is, 3 runs with the region pass manager take

2:35:27 (3 runs region passes)

but 3 runs with the function pass manager only

2:07:48 (3 runs function passes)

hence we save ~18% of the lnt time (if I didn't do a stupid math mistake).

To be precise, we save compile time "only" for larger functions with small SCoP
coverage but we do not pay for it in any other case.

Does this convince you that region passes have a bad impact on compile time?

Best regards,

Johannes
zinob edited edge metadata.Mar 4 2015, 10:18 AM

Johannes and Tobias,

We just discovered an issues with compiler time in the region pass manager. In particular in Release build where Name are not preserved

Lib/Analysis/RegionPass.cpp (86):
dumpPassInfo(P, EXECUTION_MSG, ON_REGION_MSG, CurrentRegion->getNameStr());

getNameStr() is very expensive because it LLVM construct a new Name. The compile-time issue you guys are seeing might be due to this issue.

Our fix right now is to pass empty string in release build. Toby, if you think this acceptable short term solution. Sanjin can submit a patch?

-Zino

Ok we will post it shortly.

Zino

The patch Zino refers to was committed in 231485. It did not have a large impact on our LNT performance builders (which do not seem to even see the slowdown we try to address here). However, on my laptop I was able to reproduce this performance issue in a cmake release build and Zino's patch fixes it at least for tramp3d-v4 nicely. Johannes, could you check if there is still a performance issue that needs to be addressed?

lib/Transform/ScheduleOptimizer.cpp
58

Why are these renamed?

521

Is this rename intentional?

I will cut this commit into pieces (as far as possible) but I wanted to get some initial feedback.

The patch Zino refers to was committed in 231485. It did not have a large impact on our LNT performance builders (which do not seem to even see the slowdown we try to address here). However, on my laptop I was able to reproduce this performance issue in a cmake release build and Zino's patch fixes it at least for tramp3d-v4 nicely. Johannes, could you check if there is still a performance issue that needs to be addressed?

How can the buildbots run e.g., tramp3d in 9sec when it takes for me with lnt
[1,2] >40sec for one of the source files alone. This is not new but they do
perform that good all the time... I'm puzzled here...

[1] --mllvm=-polly -clfag=-O3 in the cmd line options of the lnt runtest command

(Polly is linked into clang/opt)

[2] Alternatively the command with polly basically disabled (via polly-only-func):

/home/johannes/projects/polly/llvm-build/bin/clang++ -fno-exceptions -I/home/johannes/mysandbox/nt/test-2015-03-01_14-37-50/sample-0/MultiSource/Benchmarks/tramp3d-v4 -I/home/johannes/repos/llvm-test-suite/MultiSource/Benchmarks/tramp3d-v4 -I/home/johannes/repos/llvm-test-suite/include -I../../../include -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -DNDEBUG -O3 -mllvm -polly -m64 -fomit-frame-pointer -c /home/johannes/repos/llvm-test-suite/MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp -o /dev/null  -mllvm -polly-only-func=dsadsasadsa -mllvm -polly-allow-nonaffine-branches=false

The patch Zino refers to was committed in 231485. It did not have a large impact on our LNT performance builders (which do not seem to even see the slowdown we try to address here). However, on my laptop I was able to reproduce this performance issue in a cmake release build and Zino's patch fixes it at least for tramp3d-v4 nicely. Johannes, could you check if there is still a performance issue that needs to be addressed?

How can the buildbots run e.g., tramp3d in 9sec when it takes for me with lnt
[1,2] >40sec for one of the source files alone. This is not new but they do
perform that good all the time... I'm puzzled here...

Looking at the buildbot history, it seems we never had compile-times below 35 seconds. Can you remember where you got these 9 seconds from?

http://llvm.org/perf/db_default/v4/nts/graph?highlight_run=27014&plot.1349=23.1349.1

Johannes, I also wonder what you plan to do with the patch here. It seems the original compile time issues that it meant to fix have been resolved. Are you still planning to submit this patch for other reasons or should we close this review for now?

Looking at the buildbot history, it seems we never had compile-times below 35 seconds. Can you remember where you got these 9 seconds from?

http://llvm.org/perf/db_default/v4/nts/graph?highlight_run=27014&plot.1349=23.1349.1

My comment is over 2 months old. I cannot remember where I got this number from. Maybe I missread something or I used a number of a local lnt run.

Johannes, I also wonder what you plan to do with the patch here. It seems the original compile time issues that it meant to fix have been resolved. Are you still planning to submit this patch for other reasons or should we close this review for now?

I'll close this as there doesn't seem any interest in it.

jdoerfert abandoned this revision.May 20 2015, 7:05 AM