This is an archive of the discontinued LLVM Phabricator instance.

[X86] A first stab at a heuristic to estimate the size impact for converting movs to pushes
ClosedPublic

Authored by mkuper on Feb 11 2015, 7:24 AM.

Details

Summary

The idea is to go over all calls in the MachineFunction and compute:
a) For each callsite that can not use pushes, the penalty of not having a reserved call frame.
b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack).

This could be made more precise (e.g. by looking at the size of the constants, or even constructing the potential instruction and asking the MC layer for the encoding size. Not to mention trying to figure out the gains from folding.) but this should be a decent first approximation.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 19754.Feb 11 2015, 7:24 AM
mkuper retitled this revision from to [X86] A first stab at a heuristic to estimate the size impact for converting movs to pushes.
mkuper updated this object.
mkuper edited the test plan for this revision. (Show Details)
mkuper added a reviewer: rnk.
mkuper added a subscriber: Unknown Object (MLST).
mkuper updated this revision to Diff 19765.Feb 11 2015, 8:56 AM

Functions that do not take any parameters through the stack (or no parameters at all) should not count towards the heuristic.
Thanks to Roman Divacky for providing a test-case where this matters.

mkuper updated this revision to Diff 19767.Feb 11 2015, 9:32 AM

Same as before, but with a flow that makes a bit more sense.

rnk accepted this revision.Feb 11 2015, 2:00 PM
rnk edited edge metadata.

lgtm

lib/Target/X86/X86CallFrameOptimization.cpp
193–195 ↗(On Diff #19767)

Not if the calling convention is callee-pop. In fact, if the convention is callee-pop, using a reserved call frame requires a sub, which should give the 'mov' lowering a penalty.

Anyway, not a blocking issue, just a heuristic worth adding.

199 ↗(On Diff #19767)

Can we motivate the 3 byte saving heuristic a bit more?

test/CodeGen/X86/movtopush.ll
298–299 ↗(On Diff #19767)

I'd really like to be able to convert to pushes for __thiscall methods, which effectively have one inreg parameter.

This revision is now accepted and ready to land.Feb 11 2015, 2:00 PM

Thanks, Reid!

lib/Target/X86/X86CallFrameOptimization.cpp
193–195 ↗(On Diff #19767)

Right, will add a TODO here and get to that separately, thanks.

199 ↗(On Diff #19767)

So, 3 is an average value that looks reasonable, although it may be a bit conservative.

It depends on two things:
a) What is the value being put on the stack (register,8-bit integer or >8-bit integer)
b) For a mov, what is the displacement w.r.t to %esp (0, < 7-bits, > 7-bits)
For pushes, the encoding size for the three options of (a) are 1/2/5
For mov (%esp), they are 3/7/7, for a difference of 2/5/2. But this can only happen once per call-site.
For mov k(%esp), for k < 128, which is probably the most common case, an additional byte is encoded, and they are 4/8/8, for a difference of 3/6/3.
For mov k(%esp), for k >= 128, 4 bytes are used to encode k, so we have 7/11/11. This is probably fairly rare and can be ignored.

To me, looking at the numbers, 3 seems like a good bet, unless we want to special-case each of the above options. It won't be too precise (this also doesn't factor in the potential benefit of removing a mov by folding) - but I'm not trying to be extremely precise here, I just want to avoid making "obviously wrong" decisions.

test/CodeGen/X86/movtopush.ll
298–299 ↗(On Diff #19767)

I agree.
The next two things I want to do here are remove the push <fi> restrictions, and support __thiscall.

This revision was automatically updated to reflect the committed changes.