Page MenuHomePhabricator

[HotColdSplit] Reflect full cost of parameters in split penalty
AcceptedPublic

Authored by vsk on Mar 22 2019, 1:47 PM.

Details

Summary

Make the penalty for splitting a region more accurately reflect the cost
of materializing all of the inputs/outputs to/from the region.

This almost entirely eliminates code growth within functions which
undergo splitting in key internal frameworks, and reduces the size of
those frameworks between 2.6% to 3%.

rdar://49167240

Diff Detail

Event Timeline

vsk created this revision.Mar 22 2019, 1:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2019, 1:47 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
vsk updated this revision to Diff 191950.Mar 22 2019, 2:24 PM
  • Increase test coverage.

LGTM, sorry for not reviewing this patch. I wasn't actively reviewing that time. Please update the diff.

hiraditya added inline comments.Aug 2 2020, 5:38 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

Should we initialize this to be same as # of argument registers. Any arguments more than supported by argument registers are generally stored on stack and may add to code-size overhead both in caller and callee.

288

Do we have a limit on max # successors outside of the region? In pathological cases this piece of code may have super linear time complexity. If we dont encounter this often then this seems fine though.

vsk added a comment.Aug 6 2020, 4:22 PM

I haven't forgotten about this! Hope to have an update by Monday (8/10).

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

Ah, do you mean the number of arguments that can be passed by register? If so, yes, that seems like a straightforward improvement.

288

No, we don't have a limit like that, but I'd be open to adding one if a pathological case is found. I've never observed a performance issue here (in practice |SuccsOutsideRegion| is very small).

hiraditya added inline comments.Aug 6 2020, 7:39 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

yes

288

I've never observed a performance issue here (in practice |SuccsOutsideRegion| is very small).

In that case the current approach is fine.

vsk marked an inline comment as not done.Aug 10 2020, 1:01 PM
vsk added inline comments.
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

I had a look at TargetLowering, TargetCallingConv and a few related interfaces, but couldn't find a way to determine the number of gprs available for pointer-sized call arguments. The necessary information is available in .td files, e.g. for AArch64 there's an entry for:

CCIfType<[i32], CCAssignToRegWithShadow<[W0, W1, W2, W3, W4, W5, W6, W7],
                                        [X0, X1, X2, X3, X4, X5, X6, X7]>>,

Do you know whether there's a straightforward way to surface that?

vsk updated this revision to Diff 284487.Aug 10 2020, 1:01 PM

Rebase.

xbolva00 added inline comments.
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

Maybe @ctopper knows?

rjf added inline comments.Aug 10 2020, 3:55 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

Is there a way to get access to a TargetMachine instance here? If so, then maybe then you can use MCRegisterInfo::getNumRegs() which can be accessed via TargetMachine to get the number of registers (Apologies in advance if this isn't helping).

vsk added inline comments.Aug 11 2020, 5:36 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

MCRegisterInfo::getNumRegs() seems to count the number of registers available on the target, but we're just interested in the subset that can be used to pass pointer-sized values given the selected calling convention.

hiraditya added inline comments.Aug 14 2020, 10:17 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
89

It doesn't look like the calling convention is exposed to the middle end. So it might be tricky to do the plumbing work to get those values. Given that the calling convention do not usually change can we hard-code some commonly used values in this pass?

Briefly looking at CallingConvention files in lib/Target, it seems:

  • X86_64 only allows 4 integer arguments to be passed in a register, and 4 floating point.
  • Arch64, ARM, RISC-V have 8 argument registers.

Depending on a calling convention these values do change though.

wenlei added a subscriber: wenlei.Aug 15 2020, 11:52 PM
rcorcs added a subscriber: rcorcs.Aug 25 2020, 8:54 AM
rjf accepted this revision.EditedAug 28 2020, 10:26 PM

@vsk Hope these following data collected from compiling Firefox helps. The "Delta" values come from my previous patch: https://reviews.llvm.org/D84468. Based on the data, this patch performs very well at reducing the code size.
-Os:

DeltaSize (including dynamic libraries)
D597152.184796592 GB
02.188262032 GB
52.206931464 GB

-O3:

DeltaSize (including dynamic libraries)
-22.270277648 GB
02.247788640 GB
D597152.243288440 GB
22.259242024 GB
52.270277648 GB
This revision is now accepted and ready to land.Aug 28 2020, 10:26 PM

Any plans to merge this patch?

vsk added a comment.Sep 9 2020, 12:42 PM

@hiraditya @rjf, thanks for sharing those numbers. At the moment I don't have the bandwidth to evaluate setting a different max-params value on x86. Let me know how you'd prefer to proceed - I'm happy to have either of you commandeer the patch, or if you prefer, I can land it in its current state.

rjf added a comment.EditedSep 10 2020, 9:11 PM

@vsk If you're curious about how the max-params value affects code size, I think I can find some time to collect some numbers soon (on firefox). I'll post it here as soon as I have it.

@rjf do you have numbers on Firefox. Let us know.