The size savings are significant, and from what I can tell, both ICC and GCC do this [1] [2].
Please let me know what you think.
[1] https://godbolt.org/g/OtrQRa
[2] https://godbolt.org/g/5We0Zx
Paths
| Differential D18573
[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325) ClosedPublic Authored by hans on Mar 29 2016, 1:43 PM.
Details Summary The size savings are significant, and from what I can tell, both ICC and GCC do this [1] [2]. Please let me know what you think. [1] https://godbolt.org/g/OtrQRa
Diff Detail
Event Timelinejoerg added inline comments.
rnk edited edge metadata. Comment Actionslgtm I think this is production ready. We *mostly* build chromium with -Os, and we've fixed some bugs in this code.
This revision is now accepted and ready to land.Mar 30 2016, 4:17 PM Closed by commit rL264966: [X86] Enable call frame optimization ("mov to push") not only for optsize… (authored by hans). · Explain WhyMar 30 2016, 4:43 PM This revision was automatically updated to reflect the committed changes.
Comment Actions Thanks a lot, Hans!
Revision Contents
Diff 52155 llvm/trunk/lib/Target/X86/X86CallFrameOptimization.cpp
llvm/trunk/test/CodeGen/X86/2006-05-02-InstrSched1.ll
llvm/trunk/test/CodeGen/X86/2006-11-12-CSRetCC.ll
llvm/trunk/test/CodeGen/X86/atom-lea-sp.ll
llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll
llvm/trunk/test/CodeGen/X86/avx512-intel-ocl.ll
llvm/trunk/test/CodeGen/X86/call-push.ll
llvm/trunk/test/CodeGen/X86/cmpxchg-clobber-flags.ll
llvm/trunk/test/CodeGen/X86/coalescer-commute3.ll
llvm/trunk/test/CodeGen/X86/hipe-prologue.ll
llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
llvm/trunk/test/CodeGen/X86/libcall-sret.ll
llvm/trunk/test/CodeGen/X86/localescape.ll
llvm/trunk/test/CodeGen/X86/mcu-abi.ll
llvm/trunk/test/CodeGen/X86/memset-2.ll
llvm/trunk/test/CodeGen/X86/mingw-alloca.ll
llvm/trunk/test/CodeGen/X86/movtopush.ll
llvm/trunk/test/CodeGen/X86/phys-reg-local-regalloc.ll
llvm/trunk/test/CodeGen/X86/segmented-stacks.ll
llvm/trunk/test/CodeGen/X86/seh-catch-all-win32.ll
llvm/trunk/test/CodeGen/X86/seh-stack-realign.ll
llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll
llvm/trunk/test/CodeGen/X86/sse-intel-ocl.ll
llvm/trunk/test/CodeGen/X86/tailcall-stackalign.ll
llvm/trunk/test/CodeGen/X86/twoaddr-coalesce.ll
llvm/trunk/test/CodeGen/X86/vararg-callee-cleanup.ll
llvm/trunk/test/CodeGen/X86/win-catchpad-csrs.ll
llvm/trunk/test/CodeGen/X86/win-catchpad.ll
llvm/trunk/test/CodeGen/X86/win-cleanuppad.ll
llvm/trunk/test/CodeGen/X86/win32-eh-states.ll
llvm/trunk/test/CodeGen/X86/win32-seh-catchpad.ll
llvm/trunk/test/CodeGen/X86/win32-seh-nested-finally.ll
llvm/trunk/test/CodeGen/X86/win32_sret.ll
llvm/trunk/test/CodeGen/X86/xmulo.ll
llvm/trunk/test/CodeGen/X86/zext-fold.ll
|
Two things here for the updated patch. If the stack alignment requirement is 32bit only OR if the pushes have realigned the stack correctly (not sure if we care about the second part), the addls can be deferred to the end of the BB.
It's also cheaper to use a pop to some scratch register if available.