This is an archive of the discontinued LLVM Phabricator instance.

[PartialInlining] Reduce function outlining overhead
ClosedPublic

Authored by davidxl on May 26 2017, 5:36 PM.

Details

Summary

Allocas whose life range is fully contained inside the outlined region should be pushed to the outlined function. This not only reduces the number of live-in variables to the outline function (thus reduced number of parameters), but can also avoid 'escaping' the address of the locals (if not shrink wrapped in).

Diff Detail

Repository
rL LLVM

Event Timeline

davidxl created this revision.May 26 2017, 5:36 PM
davidxl updated this revision to Diff 100537.May 27 2017, 2:02 PM
davide edited edge metadata.May 27 2017, 4:07 PM

This is a really important optimization, IMHO (and for my usecase). Thanks for working on this.
Some comments inline.

include/llvm/Transforms/Utils/CodeExtractor.h
99–108 ↗(On Diff #100537)

Can you update this comment to reflect what's the Allocas argument here?
Also, probably this should be names SinkCandidates or something?

lib/Transforms/Utils/CodeExtractor.cpp
145 ↗(On Diff #100537)

Now that I look at this more closely, findInputsOutputs is a slightly inaccurate description after you change. Maybe we can consider splitting this in a separate function, as this one is growing quite a bit?

178–188 ↗(On Diff #100537)

I think this can be expressed much more concisely using std/llvm::any_of.

200–202 ↗(On Diff #100537)

While you're here, I'd see if you can use a range for instead (and maybe avoid the extra assignment to value, but I'm not sure if it's possible).

817 ↗(On Diff #100537)

I'm not entirely sure I like this bit.
dyn_cast<> could return nullptr, resulting in trouble when deferencing. If you're always guaranteed to get a function back, I'd use cast<>, (or, alternatively use if (dyn_cast<>)

davidxl marked 4 inline comments as done.May 28 2017, 9:54 AM
davidxl added inline comments.
lib/Transforms/Utils/CodeExtractor.cpp
200–202 ↗(On Diff #100537)

This will need begin/end methods in Instruction.

davidxl updated this revision to Diff 100554.May 28 2017, 9:55 AM

addressed Davide's feedback.

wmi added inline comments.May 28 2017, 5:33 PM
lib/Transforms/Utils/CodeExtractor.cpp
146–167 ↗(On Diff #100554)

Here it requires lifetime.start and lifetime.end are in the blocks to be extracted.
I guess there are also many cases for which we have lifetime.start appear in entry block and lifetime.end appear in return block, and the actual uses of the alloca object are all in the blocks to be extracted. For those cases, we can sink the lifetime.start and lifetime.end together into the extracted func. To do that, we may need to start from alloca instructions in non-extracted blocks instead of lifetime.start in extracted blocks. Since the number of non-extracted blocks will be smaller than extracted blocks, so I guess it will be more efficient?


This is the small case I try:

class A {
public:
void memfunc();
};
int cond;

void foo() {
A a;
if (cond)
return;
a.memfunc();
return;
}

void goo() {
foo();
}

The IR of foo:
define void @_Z3foov() local_unnamed_addr #0 {
entry:
%a = alloca %class.A, align 1
%0 = getelementptr inbounds %class.A, %class.A* %a, i64 0, i32 0
call void @llvm.lifetime.start.p0i8(i64 1, i8* nonnull %0) #3
%1 = load i32, i32* @cond, align 4, !tbaa !2
%tobool = icmp eq i32 %1, 0
br i1 %tobool, label %if.end, label %cleanup

if.end:
call void @_ZN1A7memfuncEv(%class.A* nonnull %a) #3
br label %cleanup

cleanup:
call void @llvm.lifetime.end.p0i8(i64 1, i8* nonnull %0) #3
ret void
}

For the testcase, the lifetime.start and lifetime.end are not in the blocks to be extracted, but it is beneficial to have the alloca sinked.

yes, those cases can be and will be handled but as a follow up. This patch handles cases where locals are declared in the outlined region.

To be clear, the case handled here is
void foo()
{

...
{
     A a;
     ...
 }

}

davidxl updated this revision to Diff 100581.May 28 2017, 11:54 PM

Address Wei's feedback. Find alloca candidates from Entry blocks .

wmi edited edge metadata.May 30 2017, 2:07 PM

I have no other comments. See whether Davide has further comments.

lib/Transforms/Utils/CodeExtractor.cpp
783 ↗(On Diff #100581)

They can be moved closer to their uses.

davide accepted this revision.May 30 2017, 2:09 PM

LGTM

This revision is now accepted and ready to land.May 30 2017, 2:09 PM

(sent yesterday but forgot to hit submit).

This revision was automatically updated to reflect the committed changes.