This patch adds necessary parts to implement iterative compilation in
llvm: decision trees and three passes (ModuleDecisionTreeProxy,
FunctionFitness and ICSetCurrentFunc). Further, inliner and LICM are
modified to support the iterative approach.
Details
Diff Detail
Event Timeline
This is a follow-up work on the iterative compilation framework
for clang/llvm. Previous discussion as well as the introduction to this
approach has been presented at:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-December/thread.html#68784
In this patch, the decision tree proxies are implemented as immutable
pass (as suggested by Hongbin Zheng) in order to avoid changes of
the base llvm classes. There is a new function level pass (ICSetCurrentFunc)
that sets the current function decision tree proxy. Because
immutable passes are destructed before the codegen phase, we
are saving the results in the two files: one before codegen
and the other after the codegen phase.
We plan to apply the proposed framework to the new
compiler phases. Also, machine learning can be used to better
make the initial decisions and the model for the good decisions
can be learned by the iterative compilation. Static
performance analyser (for example one suggested by Andrew Trick)
should be integrated / developed as the objective function that
is minimized.
This patch needs to be applied along with the corresponding patch for
clang. See below for the link.
All suggestions, comments are welcomed.
I took a brief first look to understand what is happening here.
I am a little surprised that there is not a _single_ comment added in this patch. I believe a couple of comments that document higher level design choices would be very helpful. I looked through the older emails, but some questions are still open:
What exactly forms the decision tree? If we have three points at which decisions are taken, will we have three nodes in the tree or depends the size of the tree on the module/program size?
What happens if a decision is reached multiple times? E.g. the inliner run on different functions? Can different decisions be taken at each point?
Similarly, is the decision tree statically known or can it change dynamically depending on what kind of decisions have been taken. E.g. if we decide to run dead code elimination, we may not even have code that could be unrolled such that the corresponding unroller decision points may become invalid?
Do you plan to support non-binary decisions. E.g. different unroll factors?
How does all this fit together. Assuming I would like to run this in a JIT compiler, what are the pieces that need to be plugged together to try to iteratively optimize a certain function?
I also have a couple of stylistic comments:
include/llvm/Analysis/DecisionTree.h | ||
---|---|---|
18 | Maybe LLVM_ANALYSIS_DECISIONTREE_H? | |
lib/Analysis/DecisionTree.cpp | ||
44 | How has this value be choosen? Does it always need to be in sync with BestFunctionFitness? | |
157 | Write: } else { | |
207 | Typo: applyResults, no? | |
379 | C++11 range loops maybe? | |
lib/Target/Mips/MipsFunctionFitness.cpp | ||
47 | Why is this commented out? (The same for the lines above) | |
73 | We commonly start variable names with upper-case letters. |
Looks like you need to add some comments to your code :)
include/llvm/Analysis/DecisionTree.h | ||
---|---|---|
45 | You could use llvm::Bitvector instead of std::vector<bool>. | |
105 | Please briefly explain all these Phases. | |
159 | Maybe you could clarify your code by adding comments like: | |
187 | Now you have "nameFunction" for the name of the current function and "CurrFnDecisionTreeProxy" for the DecisionTree of the current function. How about add a pointer to Function to DecisionTreeProxy so that you can always access the current function (as well as its name) when you have a pointer to the current DecisionTree. Then you can remobe "nameFunction". You my also need a hook to notify the DecisionTree when the function inlining pass inline and delete a function. | |
lib/Analysis/DecisionTree.cpp | ||
38 | Maybe you could use a constant (like MaxDtPriority) here instead of a magic number. | |
91 | Looks like this line of code appears in both branches, you could move it above the "if" statement. | |
104 | In this function you are creating a vector in a function and return it at the end, such an approach need to copy the vector when the function return. You could ask the caller of this function to pass in an empty vector by reference then fill the vector, such that the signature of this function become: | |
255 | How about use LLVM's streams instead of ofstream? Maybe you could write your data out in form of JSON, such that you could write python utilities to analyze/debug/render your data. And it is much more human readable. | |
lib/Transforms/IPO/Inliner.cpp | ||
450 | This "" looks confusing. |
Agreed. Will do.
I looked through the older emails, but some questions are still open:
What exactly forms the decision tree? If we have three points at which decisions are taken, will we have three nodes in the tree or depends the size of the tree on the module/program size?
What happens if a decision is reached multiple times? E.g. the inliner run on different functions? Can different decisions be taken at each point?
Similarly, is the decision tree statically known or can it change dynamically depending on what kind of decisions have been taken. E.g. if we decide to run dead code elimination, we may not even have code that could be unrolled such that the corresponding unroller decision points may become invalid?
Do you plan to support non-binary decisions. E.g. different unroll factors?
How does all this fit together. Assuming I would like to run this in a JIT compiler, what are the pieces that need to be plugged together to try to iteratively optimize a certain function?
Here is a short explanation of the forming of the decision tree.
First, some passes need to be modified to call getDecision function.
If this function returns false, a pass should do exactly as the code without modification. If the return value is true, then the alternative path is taken.
Function getDecison always returns false in the first iteration.
And the compiler should work exactly as if there was no iterrative framework.
Let us assume that getDecision function is called four times in the first iteration of iterative compilation.
After the end of the first iteration, the decision tree will look like this:
o \ o \ o \ o \ o
In the second iteration, exactly one decision is changed and the compiler works as in the first iteration until it reaches this particular decision. Then alternative path is taken in this point.
Further, getDecision function returns false until compilation iteration is finished. Let us assume that we took alternative decision at the third decision and that there are three more calls
of getDecision function. Note that different compilation paths may have different number of calls of getDecision function.
o \ o \ o / \ o o \ \ o o \ o
Every new iteration adds one new branch to the decision tree.
At the end of each iteration fitness of the generated code is evaluated. In the last iteration, the compiler takes the path with the best fitness.
To augment the selection of a node where alternative decision would be taken, getDecision tree takes one parameter.
This parameter is interpreted as priority and the node with highest priory is selected for next alternative decision.
Machine learning approach
Formed decision tree can be used to train a binary classifier which can be used to facilitate existing heuristics. By collecting nodes where branching of decision tree occurred, we have training examples for the classifier. We could replace existing heuristics with this trained classifier and potentially get better code even without iterative approach.
This may be a good approach for jit compiler but it requires adding this machine learning approach which is planed for the future.
N-ary decisions are not planed for now because every n-ary decision can be replaced with n-1 binary decision. For example if we have to decide for a number from zero to three we can set three yes/no questions:
(number is zero or greater then zero?) / \ 0 (number is one or greater then one?) / \ 1 (number is 2 or 3?) / \ 2 3
If some decisions are made relevant by some future decisions this is only some inefficiency of the compilation process but they stay in the decision tree because nodes are never deleted. The nodes represent some decision points in the history, and path from the root of the tree to the leaf has enough information to exactly replay compilation iteration.
include/llvm/Analysis/DecisionTree.h | ||
---|---|---|
18 | Done | |
45 | Done. | |
105 | Done | |
159 | Thanks! Done | |
187 | "nameFunction" is removed. We are now using ModuleDecisionTreeProxies::setCurrentFunctionAndPhase to find and set decision tree proxy for given function name and compilation phase. So function name and | |
lib/Analysis/DecisionTree.cpp | ||
38 | Now INT_MAX is used. | |
44 | Now INT_MAX is used. | |
91 | Done | |
104 | Done | |
157 | Done | |
207 | Fixed | |
255 | Now using llvm::MemoryBuffer for reading and raw_ostream for writing. Temp files are now saved in JSON format and parsed by llmv YAML parser. | |
379 | Done | |
lib/Target/Mips/MipsFunctionFitness.cpp | ||
47 | Commented out lines deleted. | |
73 | OK. Done for some variables. | |
lib/Transforms/IPO/Inliner.cpp | ||
450 | Now function ModuleDecisionTreeProxies::setModuleCompilationPhase() is called to set decision tree for the ModulePhase. |
Thank you for the comments. This is the new version of the patch which addresses the comments received.
The design discussion has stalled and not gone any where so I don't think its useful to review this patch yet.
- Rebase
This version of patch is rebased to current trunk. Old version can be
found here:
- Decision points in pre-codegen and in codegen phase
This version introduces support for decision points in both
pre-codegen and codegen phase.
This feature led to a lot changes in code becuse implementation of the
ImmutablePass is such that an ImmutablePass is destroyed between
pre-codegen and codegen phase. The issue is resolved with temporary
JSON files which pass information between those two phases. Also, we
had to merge results from these phases.
We find that such behaviour of ImmutablePass and boundary between
pre-codegen and codegen phases is a bit strange and that there is
space for improvements which would simplify this patch and
implementation of similar features in llvm.
- New optimization point - RegAllocGreedy
Decision points in codegen phase enabled us to implement a new
optimization point where iterative compilation can take an alternative
path. In this version of patch iterative compilation can take
alternative paths in register allocation, next to LICM and Inliner
which were present in the old version, too.
- Bugfixes
During development we fixed some bugs. Some of them were introduced by
new changes and some were existing even in the old version.
- Testing procedure and current status
We are targeting MIPS architecture and current testing procedure looks
like this:
- create CSiBE test configuration
- preprocess files with gcc
- run all tests with -Oz -target mipsel-unknown-linux
-fiterative-comp=n -S
- assemble output with MIPS gcc toolchain
- measure results with size
Our results show that there are improvements of up to 8.78% in code
size which is promising and we are working on getting such results in
more cases and getting even better results.
Our results also show that making alternative decisions in compilation
path in LICM, Inliner or RegAlloc only (separated) can lead to
improvements in code size up to, respectively, 6%, 5% and 8%. This
results proved our assumption that these optimization points are good
for making decisions to take alternative compilation path is true.
Significant improvements are seen as early as with 10 iterations, but
100 iterations lead to even better results. As expected, improvements
and iteration count are not linearly dependant.
- RFC
We would like to see this feature (iterative compilation) merged into
the LLVM/Clang core and we would like to hear what we should change
and improve so this will be possible in the future.
We would also like to hear from you if you find this approach
interesting and get any suggestion where to steer research and
experiments. Any suggestions which passes have potential for taking
alternative path?
Currently, there are few issues that we are working among which the
most interesting are:
- Register allocator has higher priority than other optimization
points so we will change implementation in that way that all
optimization points can get a chance to take alternative direction
- Our decision whether to take an alternative path or not is based on
a fitness function which can be improved. We will be working on
that, too.
Please, feel free to experiment with this patch and to contact us for
any help and information.
This review did not include llvm-commits as a subscriber, so the review
hasn't been sent to/shared with the list.
Please cancel it and start a new one including the list (adding the list
after the review is created is not a great experience for the list as phab
doesn't resend the intro email with all the details/summary)
I see llvm-commits as one of the subscribers, not sure why the post has not
reached the list.
Please cancel it and start a new one including the list (adding the list
after the review is created is not a great experience for the list as phab
doesn't resend the intro email with all the details/summary)
There is a history of this change here, some questions have been already
asked and answered, I suggest we continue to use it for that sake.
I can resend the summary of the latest patch to the llvm-commit list, if
that would help.
Regards,
Petar
llvm-commits has definitely been there from the start (see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140728/228369.html) but this thread was started in July 2014 and the last significant activity was September 2014. I'd suggest starting a new review and linking to this one.
This patch can now be abandoned, as an updated version is available at http://reviews.llvm.org/D12199 .
Maybe LLVM_ANALYSIS_DECISIONTREE_H?