This is an archive of the discontinued LLVM Phabricator instance.

[PartialInl] Enhance partial inliner to handle more complex conditions
ClosedPublic

Authored by davidxl on Apr 19 2017, 3:07 PM.

Details

Diff Detail

Event Timeline

davidxl created this revision.Apr 19 2017, 3:07 PM
eraman edited edge metadata.Apr 19 2017, 3:40 PM

I've a couple of trivial comments so far. Could you please generate a diff with a larger context?

lib/Transforms/IPO/PartialInlining.cpp
40

The comments doesn't convey what the function is doing.

47

nit: change -> chain

50

Why is this named Entries_ and not Entries?

davidxl updated this revision to Diff 95840.Apr 19 2017, 3:44 PM

Addressed review comments .

davidxl marked an inline comment as done.Apr 19 2017, 3:45 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
40

Fixed.

50

Fixed.

davide added a subscriber: davide.Apr 19 2017, 4:53 PM
wmi added inline comments.Apr 19 2017, 4:58 PM
lib/Transforms/IPO/PartialInlining.cpp
75

nit: ghain --> chain

201–210

Here we add a limitation that the function only has one return BB to be a partial inline candidate. Is it required? Can we walk forward from entry instead of backward from the return block so as to avoid the requirement of finding single return BB?

299–307

Here we allows a node on the chain with more than two successors. Will such case bring us problem later during extracting?

test/Transforms/CodeExtractor/MultiConditions.ll
5 ↗(On Diff #95840)

we can use opt -instnamer to rename the temporaries.

davidxl marked 2 inline comments as done.Apr 19 2017, 11:26 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
201–210

the restriction no it is not required. Changed as suggested.

(Note if the extracted region has other return instruction, extra code will be generated (to handle the return status in caller), so the call to the outlined function is slightly higher).

299–307

Changed the code to limit 2 successors. More than one successors can cause problems or lead to very inefficient outline call (to handle 'multiple entries').

davidxl updated this revision to Diff 95892.Apr 19 2017, 11:27 PM

Addressed review comments. Added more test including multi return case.

wmi edited edge metadata.Apr 20 2017, 10:17 AM

I am thinking of another two issues:

  1. Is it possible that some of the BBs on the chain may be very big and we don't want to partial inline them?
  2. The existing pattern handles if (a || b || c ...) case, but it may not be easy to extend for cases like (a && b && c) and ((a && b) || c). Basically, we want to find a collection of bbs with small sizes starting from entry. The bb collection only have two exits. one of them is ReturnBlock and the other is NonReturnBlock. The edges inside of the collection can be of any pattern.

Sorry I didn't come up with them at the first beginning and dive into details.

Is it possible that some of the BBs on the chain may be very big and we don't want to partial inline them?

Yes, i was thinking adding a limit on the chain length as an option. Probably do this a a follow up -- which also needs to look at profile data. (Note that Partialinlininer is currently not enabled in any optimization pipeline yet).

The existing pattern handles if (a || b || c ...) case, but it may not be easy to extend for cases like (a && b && c) and ((a && b) || c). Basically, we want to find a collection of bbs with small sizes starting from entry. The bb collection only have two exits. one of them is ReturnBlock and the other is NonReturnBlock. The edges inside of the collection can be of any pattern.

a&b&c case is already partially handled -- the partial inliner can partially peel the first iteration which may be good enough. However I do like your suggestion of computing the entry predicate region idea.

Yes, i was thinking adding a limit on the chain length as an option. Probably do this a a follow up -- which also needs to look at profile data. (Note that Partialinlininer is currently not enabled in any optimization pipeline yet).

I think instruction count will be better limit than chain length.

lib/Transforms/IPO/PartialInlining.cpp
333

This will look cleaner if this is made to while (CommonSucc) and peel the else part outside of the while loop (since you always break from there). Perhaps you can even peel the first iteration out of the loop: first find the candidate non-return block, then walk the chain in a loop as long as one successor equals non-return block and the other succesor has a single predecessor. Some more comments will also increase readability.

davidxl updated this revision to Diff 96665.Apr 25 2017, 9:10 PM

Address review feedback. The new version now can handle more general conditions (see test cases).

davide added inline comments.Apr 25 2017, 9:25 PM
lib/Transforms/IPO/PartialInlining.cpp
43

typo, partially inlined. BTW, do you have numbers for different values of MaxNumInlineBlocks ?

66–68

SmallVector<> maybe

119–130

These could probably live in BasicBlockUtils.

133–138

Also this one. Please remove else after return

161–165

Can you add comments about why we break in this case? (i.e. we don't outline blocks without exactly two successors, etc...)

199–201

braces unneeded.

329

explicit type maybe?

336

explicit type maybe?

345–347

return BB == NewReturnBlock || NewEntries.count(BB) , maybe?

381–384

unrelated change, also braces are unneeded.

407

unrelated?

davidxl marked 9 inline comments as done.Apr 25 2017, 9:56 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
43

Currently it is set to a high number for stress testing purpose. When partial inliner is enabled in the pipeline, the default value needs to be tuned depending on opt level.

119–130

will leave it here for now.

133–138

removed else.

davidxl updated this revision to Diff 96667.Apr 25 2017, 9:57 PM
davidxl marked an inline comment as done.

addressed review comments.

wmi added a comment.Apr 26 2017, 10:47 AM

Some minor comments.

lib/Transforms/IPO/PartialInlining.cpp
68

I find NonReturnBlockPreds has been set but not used.

135

we can change the else if to if.

220–221

should we use an assertion here instead of an early exit, so we can catch mistake instead of giving up silently.

349–352

ToBeInlined(NewNonReturnBlock) is also false, so can we just do:

for (BasicBlock &BB : *DuplicateFunction)
  if (!ToBeInlined(&BB))
    ToExtract.push_back(&BB);
davidxl marked an inline comment as done.Apr 26 2017, 1:20 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
68

Removed

220–221

No. This is designed in way such that the candidate found may be illegal (to simplify the code), so a check here is necessary.

349–352

NewNonReturnBlock needs to be the head of the blocks to be cloned. It is already pushed in, so the filter is needed.

davidxl updated this revision to Diff 96811.Apr 26 2017, 1:28 PM

Address Wei's review feedback. Also fixed a bug found in clang self build test.

wmi accepted this revision.Apr 26 2017, 1:52 PM

LGTM.

This revision is now accepted and ready to land.Apr 26 2017, 1:52 PM
eraman added inline comments.Apr 26 2017, 2:11 PM
lib/Transforms/IPO/PartialInlining.cpp
42

Expressing this as instructions will make this somewhat similar to inlining cost.

146

Cleaner to do return std::make_tuple(Succ1, Succ2) here and the following if

182

No need to initialize it here since GetCommonSucc would return a (nullptr, nullptr) tuple if there is no triangle shape.

davidxl marked 2 inline comments as done.Apr 26 2017, 2:33 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
42

This will be done in a follow up patch before the final patch that turn it on.

davidxl updated this revision to Diff 96824.Apr 26 2017, 2:34 PM

Address review comments.

eraman added inline comments.Apr 26 2017, 4:57 PM
lib/Transforms/IPO/PartialInlining.cpp
161

Some comment here will be useful. You want to keep space for 1 return block and the CurrEntry and that's why you've the above. I initially thought the condition was wrong.

238

The HasNonEntryPred lambda checks for the presence of preds in Entries (the DenseSet above), but you're not updating the Entries in this loop when you insert Cand into OutliningInfo->Entries. So this loop will return after 1 iteration right?

davidxl added inline comments.Apr 26 2017, 8:42 PM
lib/Transforms/IPO/PartialInlining.cpp
161

Fixed. Added a member function.

238

good catch.Fixed.

davidxl updated this revision to Diff 96863.Apr 26 2017, 8:43 PM

Addressed review feedback by eraman.

eraman accepted this revision.Apr 26 2017, 8:55 PM

LGTM.

Please add a comment to explain the condition for exiting the loop for exceeding MaxNumInlineBlocks.

lib/Transforms/IPO/PartialInlining.cpp
161

You should still add a comment in the member function about the +1 (the return block) and here (we still have a pending CurrEntry block).

This revision was automatically updated to reflect the committed changes.
gyiu added a subscriber: gyiu.Jul 21 2017, 12:23 PM
gyiu added inline comments.
llvm/trunk/lib/Transforms/IPO/PartialInlining.cpp
299 ↗(On Diff #96932)

@davidxl Please correct me if I'm wrong, but in this loop, if the first instruction in the BB is not a PHINode we give up? If that's the case, then wouldn't 'Phi' be the first and only Phi we'll find?

davidxl added inline comments.Jul 21 2017, 12:34 PM
llvm/trunk/lib/Transforms/IPO/PartialInlining.cpp
299 ↗(On Diff #96932)

The method returns the first phi of the block if it exists. It should add this:

if (isa<DbgInfoIntrinsic>(I))

continue;