This is a work-in-progress-patch to enable interprocudural optimization of
libquantum with Polly. It is not yet intended for submission, but illustrates
some of the pass-pipeline changes needed to get end-to-end interprocedural loop
fusion working. Some of choices we take could potentially be improved using
inter-procedural scop modeling, but the LLVM inliner seems to be pretty close
to getting things right.
The core transformation in Polly that enables this optimization is the sparse
representation of a scop model, which I collaborated on with Jan Sjoedermann
(Student from David Chisnall) in the context of array bounds checking and then
later with Johannes Doerfert in the context of LBM and libquantum.
Allow partial inlining of vararg functions
Make gold-plugin compile for me
Add polly support to gold plugin
Disallow remainders in loop unroller
The loop unroller is (even as part of LTO mode) run in the per-TU compilations
and blows up code size without reason. Per-TU compilations should canonicalize,
not spezialized in LTO mode.
Adjust Pass manager for LTO+Polly+libquantum
- Enable partial inlining
- Add partial inling to LTO pipeline
- Add Polly to LTO pipeline
- Disable Polly in per-TU pipeline
Inliner: disable single-callsite static bonus
In libquantum the quantum*_ft should not be inlined as this inlining prevents
the inlining of the non-ft functions, which is needed to enable loop fusion with
polly.
As the fault-tolerant versions of the libquantum functions are rather large, the
LLVM inliner would not inline them by default. However, in certain cases the
fact the the _ft functions are only called once causes the single-call-site
bonus to be applied, which allows LLVM to inline the _ft functions and as a
result prevents later inlining of the non-ft functions and consequently prevents
later loop fusion. Interestingly, LLVM already checks in shouldBeDeferred if
inlining of a leave function prevents other possibly more beneficial inlining
opportunities such as inlining of the non-ft functions. While shouldBeDeferred
seems to work in general, the single-callsite bonus being applied drops the
overall CandidateCost below zero, such that shouldBeDeferred is effectively
useless.
It seems that we should evaluate "shouldBeDeferred" without the single
callsite bonus being applied and only add the single callsite bonus after
shouldBeDefferred has been evaluated. For now just disable the single callsite
bonus.
Hi Florian,
are you interested in upstreaming this hack as well?
AFAIU we should subtract this LastCallBonus only after shouldBeDeferred has been called as otherwise shouldBeDeferred might not have any effect?