This is an archive of the discontinued LLVM Phabricator instance.

Enable interprocedural optimization in libquantum - LLVM-part [WIP]
Needs ReviewPublic

Authored by grosser on Oct 5 2017, 7:44 AM.

Details

Reviewers
bollu
Summary

This is a work-in-progress-patch to enable interprocudural optimization of
libquantum with Polly. It is not yet intended for submission, but illustrates
some of the pass-pipeline changes needed to get end-to-end interprocedural loop
fusion working. Some of choices we take could potentially be improved using
inter-procedural scop modeling, but the LLVM inliner seems to be pretty close
to getting things right.

The core transformation in Polly that enables this optimization is the sparse
representation of a scop model, which I collaborated on with Jan Sjoedermann
(Student from David Chisnall) in the context of array bounds checking and then
later with Johannes Doerfert in the context of LBM and libquantum.

Allow partial inlining of vararg functions

Make gold-plugin compile for me

Add polly support to gold plugin

Disallow remainders in loop unroller

The loop unroller is (even as part of LTO mode) run in the per-TU compilations
and blows up code size without reason. Per-TU compilations should canonicalize,
not spezialized in LTO mode.

Adjust Pass manager for LTO+Polly+libquantum

  • Enable partial inlining
  • Add partial inling to LTO pipeline
  • Add Polly to LTO pipeline
  • Disable Polly in per-TU pipeline

Inliner: disable single-callsite static bonus

In libquantum the quantum*_ft should not be inlined as this inlining prevents
the inlining of the non-ft functions, which is needed to enable loop fusion with
polly.

As the fault-tolerant versions of the libquantum functions are rather large, the
LLVM inliner would not inline them by default. However, in certain cases the
fact the the _ft functions are only called once causes the single-call-site
bonus to be applied, which allows LLVM to inline the _ft functions and as a
result prevents later inlining of the non-ft functions and consequently prevents
later loop fusion. Interestingly, LLVM already checks in shouldBeDeferred if
inlining of a leave function prevents other possibly more beneficial inlining
opportunities such as inlining of the non-ft functions. While shouldBeDeferred
seems to work in general, the single-callsite bonus being applied drops the
overall CandidateCost below zero, such that shouldBeDeferred is effectively
useless.

It seems that we should evaluate "shouldBeDeferred" without the single
callsite bonus being applied and only add the single callsite bonus after
shouldBeDefferred has been evaluated. For now just disable the single callsite
bonus.

Event Timeline

grosser created this revision.Oct 5 2017, 7:44 AM
fhahn added a subscriber: fhahn.Oct 5 2017, 9:40 AM

Florian Hahn, feel free to pick this one up.

grosser added inline comments.Nov 16 2017, 9:39 AM
lib/Analysis/InlineCost.cpp
844

Hi Florian,

are you interested in upstreaming this hack as well?

AFAIU we should subtract this LastCallBonus only after shouldBeDeferred has been called as otherwise shouldBeDeferred might not have any effect?

fhahn added inline comments.Nov 21 2017, 8:05 AM
lib/Analysis/InlineCost.cpp
844

Sure, I'll look into it

Hi Florian,

any update on the partial inliner changes?

fhahn added a comment.Dec 9 2017, 4:51 AM

Hi Tobias,

I've committed the vararg support a while ago. There is a patch under review to enable partial inlining by default D40477, but it needs more benchmarking I think. I'll look into the LastCallBonus stuff soon!

Cheers,
Florian