This is an archive of the discontinued LLVM Phabricator instance.

[PartialInlining] Enable recursive partial inlining.
ClosedPublic

Authored by rudkx on Oct 20 2022, 2:44 PM.

Details

Summary

It seems unnecessarily limiting to disallow recursive partial
inlining, and there are clearly cases where it can benefit
code by avoiding a function call and potentially enabling
other transformations like dead argument elimination
in cases where an argument is only used prior to the early-out
test at the top of the function.

The pass already properly rewrites the recursive calls
within the body of the freshly cloned function, so the only
change here is removing the bail-out when recursion is
detected.

Diff Detail

Event Timeline

rudkx created this revision.Oct 20 2022, 2:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 2:44 PM
rudkx requested review of this revision.Oct 20 2022, 2:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 2:44 PM

Can one of the listed reviewers please review, or suggest alternative reviewers?

This was the best list I could come up with based on recent active history in this file that wasn't code clean up, and based on CODE_OWNERS.TXT.

Thanks!

are you see that this is beneficial in practice?

llvm/test/Transforms/PartialInlining/recursive_partial_inlining.ll
3

can you remove the simplifycfg run so we can more accurately see what the pass is doing?

rudkx added a comment.Nov 11 2022, 1:52 PM

are you see that this is beneficial in practice?

Yes, I have an internal test case for which this results in the only references to a pointer argument being moved out of the function, which then allows dead argument elimination to remove that argument, which further results in that stack data that was being passed indirectly to be split by SROA and moved off the stack into registers.

rudkx updated this revision to Diff 474875.Nov 11 2022, 3:28 PM

Update test case to not use --simplify-cfg.

rudkx marked an inline comment as done.Nov 11 2022, 3:29 PM
rudkx added inline comments.
llvm/test/Transforms/PartialInlining/recursive_partial_inlining.ll
3

Done.

Thanks for taking a look!

The obvious concern for making inlining more aggressive is potential codesize growth. Does the cost model prevent us from cloning the function multiple times? How much growth are we talking about for practical code?

llvm/test/Transforms/PartialInlining/recursive_partial_inlining.ll
2

update_test_checks.py has a flag --include-generated-funcs you might want to use here.

rudkx marked an inline comment as done.Nov 16 2022, 8:22 AM
rudkx added inline comments.
llvm/test/Transforms/PartialInlining/recursive_partial_inlining.ll
2

I tried using that flag, and unfortunately it seems to be generating check lines that FileCheck fails to match. The check lines look fine to me, so it may be a FileCheck bug. I've had similar issues in the past with hand-written lines where FileCheck seems to either get hung-up on trailing text on a line, or is having a problem with variable captures that look identical but somehow it doesn't think match.

rudkx added a comment.Dec 2 2022, 10:17 AM

The obvious concern for making inlining more aggressive is potential codesize growth. Does the cost model prevent us from cloning the function multiple times? How much growth are we talking about for practical code?

Hi @efriedma. The majority of the body of the function will get cloned exactly once. The early-exit portion of the function (leading to the first branch) will get cloned once per call-site. Since the only change here is to also allow partial inlining to happen for recursive functions, I would expect code growth in general to be quite small over just having this enabled for non-recursive functions. In other words, I would expect if there were a code growth problem as a result of partial inlining we'd already be seeing it since non-recursive functions are going to be the much more common case.

Having said that, I was hoping to try something like a bootstrap build of clang to see if there was any noticeable impact, but the instructions I found online for doing those builds don't seem to work. In an internal test suite I found that this only fired a handful of times (most likely because the code shape that this pass detects is very specific, and the cost model is pretty conservative).

If you still have concerns I'd be happy to gather some data if you have a suggestion of what you think would be meaningful.

efriedma added a comment.EditedDec 2 2022, 10:59 AM

Not sure how the CMake 2-stage bits work, but you can always manually "bootstrap": build clang, then do a separate clang build with -DCMAKE_CXX_COMPILER=/path/to/built/clang++ .

rudkx added a comment.Dec 2 2022, 9:12 PM

Not sure how the CMake 2-stage bits work, but you can always manually "bootstrap": build clang, then do a separate clang build with -DCMAKE_CXX_COMPILER=/path/to/built/clang++ .

Yes good point. I gave that a try and I see no difference in size in the text section of clang, so it looks like it's either not hitting at all, or it's not causing enough code growth to round to the next page.

efriedma accepted this revision.Dec 5 2022, 9:49 AM

In that case, I'm not really concerned here; LGTM

This revision is now accepted and ready to land.Dec 5 2022, 9:49 AM
rudkx added a comment.Dec 5 2022, 1:16 PM

Great, thanks again @efriedma.

rudkx updated this revision to Diff 480326.Dec 5 2022, 9:01 PM

Rebase on top of tree.

This revision was landed with ongoing or failed builds.Dec 5 2022, 11:10 PM
This revision was automatically updated to reflect the committed changes.