This is an archive of the discontinued LLVM Phabricator instance.

[PartialInl] Enhance partial inliner to handle more complex conditions
ClosedPublic

Authored by davidxl on Apr 19 2017, 3:07 PM.

Details

Diff Detail

Event Timeline

davidxl created this revision.Apr 19 2017, 3:07 PM
eraman edited edge metadata.Apr 19 2017, 3:40 PM

I've a couple of trivial comments so far. Could you please generate a diff with a larger context?

lib/Transforms/IPO/PartialInlining.cpp
40

The comments doesn't convey what the function is doing.

47

nit: change -> chain

50

Why is this named Entries_ and not Entries?

davidxl updated this revision to Diff 95840.Apr 19 2017, 3:44 PM

Addressed review comments .

davidxl marked an inline comment as done.Apr 19 2017, 3:45 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
40

Fixed.

50

Fixed.

davide added a subscriber: davide.Apr 19 2017, 4:53 PM
wmi added inline comments.Apr 19 2017, 4:58 PM
lib/Transforms/IPO/PartialInlining.cpp
40

nit: ghain --> chain

110–119

Here we add a limitation that the function only has one return BB to be a partial inline candidate. Is it required? Can we walk forward from entry instead of backward from the return block so as to avoid the requirement of finding single return BB?

158–166

Here we allows a node on the chain with more than two successors. Will such case bring us problem later during extracting?

test/Transforms/CodeExtractor/MultiConditions.ll
6

we can use opt -instnamer to rename the temporaries.

davidxl marked 2 inline comments as done.Apr 19 2017, 11:26 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
110–119

the restriction no it is not required. Changed as suggested.

(Note if the extracted region has other return instruction, extra code will be generated (to handle the return status in caller), so the call to the outlined function is slightly higher).

158–166

Changed the code to limit 2 successors. More than one successors can cause problems or lead to very inefficient outline call (to handle 'multiple entries').

davidxl updated this revision to Diff 95892.Apr 19 2017, 11:27 PM

Addressed review comments. Added more test including multi return case.

wmi edited edge metadata.Apr 20 2017, 10:17 AM

I am thinking of another two issues:

  1. Is it possible that some of the BBs on the chain may be very big and we don't want to partial inline them?
  2. The existing pattern handles if (a || b || c ...) case, but it may not be easy to extend for cases like (a && b && c) and ((a && b) || c). Basically, we want to find a collection of bbs with small sizes starting from entry. The bb collection only have two exits. one of them is ReturnBlock and the other is NonReturnBlock. The edges inside of the collection can be of any pattern.

Sorry I didn't come up with them at the first beginning and dive into details.

Is it possible that some of the BBs on the chain may be very big and we don't want to partial inline them?

Yes, i was thinking adding a limit on the chain length as an option. Probably do this a a follow up -- which also needs to look at profile data. (Note that Partialinlininer is currently not enabled in any optimization pipeline yet).

The existing pattern handles if (a || b || c ...) case, but it may not be easy to extend for cases like (a && b && c) and ((a && b) || c). Basically, we want to find a collection of bbs with small sizes starting from entry. The bb collection only have two exits. one of them is ReturnBlock and the other is NonReturnBlock. The edges inside of the collection can be of any pattern.

a&b&c case is already partially handled -- the partial inliner can partially peel the first iteration which may be good enough. However I do like your suggestion of computing the entry predicate region idea.

Yes, i was thinking adding a limit on the chain length as an option. Probably do this a a follow up -- which also needs to look at profile data. (Note that Partialinlininer is currently not enabled in any optimization pipeline yet).

I think instruction count will be better limit than chain length.

lib/Transforms/IPO/PartialInlining.cpp
193

This will look cleaner if this is made to while (CommonSucc) and peel the else part outside of the while loop (since you always break from there). Perhaps you can even peel the first iteration out of the loop: first find the candidate non-return block, then walk the chain in a loop as long as one successor equals non-return block and the other succesor has a single predecessor. Some more comments will also increase readability.

davidxl updated this revision to Diff 96665.Apr 25 2017, 9:10 PM

Address review feedback. The new version now can handle more general conditions (see test cases).

davide added inline comments.Apr 25 2017, 9:25 PM
lib/Transforms/IPO/PartialInlining.cpp
43

typo, partially inlined. BTW, do you have numbers for different values of MaxNumInlineBlocks ?

66–68

SmallVector<> maybe

120–131

These could probably live in BasicBlockUtils.

134–139

Also this one. Please remove else after return

162–166

Can you add comments about why we break in this case? (i.e. we don't outline blocks without exactly two successors, etc...)

203–205

braces unneeded.

319

explicit type maybe?

326

explicit type maybe?

335–337

return BB == NewReturnBlock || NewEntries.count(BB) , maybe?

374–376

unrelated change, also braces are unneeded.

401

unrelated?

davidxl marked 9 inline comments as done.Apr 25 2017, 9:56 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
43

Currently it is set to a high number for stress testing purpose. When partial inliner is enabled in the pipeline, the default value needs to be tuned depending on opt level.

120–131

will leave it here for now.

134–139

removed else.

davidxl updated this revision to Diff 96667.Apr 25 2017, 9:57 PM
davidxl marked an inline comment as done.

addressed review comments.

wmi added a comment.Apr 26 2017, 10:47 AM

Some minor comments.

lib/Transforms/IPO/PartialInlining.cpp
68

I find NonReturnBlockPreds has been set but not used.

136

we can change the else if to if.

222–223

should we use an assertion here instead of an early exit, so we can catch mistake instead of giving up silently.

340–343

ToBeInlined(NewNonReturnBlock) is also false, so can we just do:

for (BasicBlock &BB : *DuplicateFunction)
  if (!ToBeInlined(&BB))
    ToExtract.push_back(&BB);
davidxl marked an inline comment as done.Apr 26 2017, 1:20 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
68

Removed

222–223

No. This is designed in way such that the candidate found may be illegal (to simplify the code), so a check here is necessary.

340–343

NewNonReturnBlock needs to be the head of the blocks to be cloned. It is already pushed in, so the filter is needed.

davidxl updated this revision to Diff 96811.Apr 26 2017, 1:28 PM

Address Wei's review feedback. Also fixed a bug found in clang self build test.

wmi accepted this revision.Apr 26 2017, 1:52 PM

LGTM.

This revision is now accepted and ready to land.Apr 26 2017, 1:52 PM
eraman added inline comments.Apr 26 2017, 2:11 PM
lib/Transforms/IPO/PartialInlining.cpp
42

Expressing this as instructions will make this somewhat similar to inlining cost.

147

Cleaner to do return std::make_tuple(Succ1, Succ2) here and the following if

183

No need to initialize it here since GetCommonSucc would return a (nullptr, nullptr) tuple if there is no triangle shape.

davidxl marked 2 inline comments as done.Apr 26 2017, 2:33 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
42

This will be done in a follow up patch before the final patch that turn it on.

davidxl updated this revision to Diff 96824.Apr 26 2017, 2:34 PM

Address review comments.

eraman added inline comments.Apr 26 2017, 4:57 PM
lib/Transforms/IPO/PartialInlining.cpp
160

Some comment here will be useful. You want to keep space for 1 return block and the CurrEntry and that's why you've the above. I initially thought the condition was wrong.

237

The HasNonEntryPred lambda checks for the presence of preds in Entries (the DenseSet above), but you're not updating the Entries in this loop when you insert Cand into OutliningInfo->Entries. So this loop will return after 1 iteration right?

davidxl added inline comments.Apr 26 2017, 8:42 PM
lib/Transforms/IPO/PartialInlining.cpp
160

Fixed. Added a member function.

237

good catch.Fixed.

davidxl updated this revision to Diff 96863.Apr 26 2017, 8:43 PM

Addressed review feedback by eraman.

eraman accepted this revision.Apr 26 2017, 8:55 PM

LGTM.

Please add a comment to explain the condition for exiting the loop for exceeding MaxNumInlineBlocks.

lib/Transforms/IPO/PartialInlining.cpp
160

You should still add a comment in the member function about the +1 (the return block) and here (we still have a pending CurrEntry block).

This revision was automatically updated to reflect the committed changes.
gyiu added a subscriber: gyiu.Jul 21 2017, 12:23 PM
gyiu added inline comments.
llvm/trunk/lib/Transforms/IPO/PartialInlining.cpp
299 ↗(On Diff #96932)

@davidxl Please correct me if I'm wrong, but in this loop, if the first instruction in the BB is not a PHINode we give up? If that's the case, then wouldn't 'Phi' be the first and only Phi we'll find?

davidxl added inline comments.Jul 21 2017, 12:34 PM
llvm/trunk/lib/Transforms/IPO/PartialInlining.cpp
299 ↗(On Diff #96932)

The method returns the first phi of the block if it exists. It should add this:

if (isa<DbgInfoIntrinsic>(I))

continue;