This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
ClosedPublic

Authored by hans on Mar 29 2016, 1:43 PM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

hans updated this revision to Diff 51968.Mar 29 2016, 1:43 PM
hans retitled this revision from to [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325).
hans updated this object.
hans added reviewers: DavidKreitzer, mkuper, rnk.
hans added a subscriber: llvm-commits.
joerg added a subscriber: joerg.Mar 29 2016, 2:39 PM
joerg added inline comments.
test/CodeGen/X86/win32-seh-nested-finally.ll
46 ↗(On Diff #51968)

The changes here look suspicious, doesn't the code need to restore %esp before the final popl?

test/CodeGen/X86/xmulo.ll
28 ↗(On Diff #51968)

Each immediate push is two Bytes, materializing $0 as register is two Bytes as well, but a register push is one Byte. So for three pushes, a shorter sequence is actually:

xorl %eax, %eax
pushl %eax
pushl %eax
pushl %eax
test/CodeGen/X86/zext-fold.ll
38 ↗(On Diff #51968)

Two register pushes are smaller than one subl?

hans added inline comments.Mar 29 2016, 4:44 PM
test/CodeGen/X86/win32-seh-nested-finally.ll
46 ↗(On Diff #51968)

This does look weird. Reid, is there something magic about these invokes, or is mov-to-push broken here?

test/CodeGen/X86/xmulo.ll
28 ↗(On Diff #51968)

Yes, we ca probably be more efficient here. See also PR26330 where we get this wrong the other way around.

test/CodeGen/X86/zext-fold.ll
38 ↗(On Diff #51968)

The savings from the pushes offset the cost of the the sub.

The sub seems unnecessary though. It would be nice to not emit it.

hans added inline comments.Mar 30 2016, 3:13 PM
test/CodeGen/X86/win32-seh-nested-finally.ll
46 ↗(On Diff #51968)

Oh wait, %esp does get restored in between, it's just not in the test expectations. Let me fix that..

hans updated this revision to Diff 52139.Mar 30 2016, 3:13 PM

Clarify win32-seh-nested-finally.ll

rnk accepted this revision.Mar 30 2016, 4:17 PM
rnk edited edge metadata.

lgtm I think this is production ready. We *mostly* build chromium with -Os, and we've fixed some bugs in this code.

lib/Target/X86/X86CallFrameOptimization.cpp
176–178 ↗(On Diff #52139)

Huh, so we're already doing this a lot due to inalloca.

test/CodeGen/X86/win32-seh-nested-finally.ll
46 ↗(On Diff #52139)

Great, I was halfway through checking that before I got distracted...

test/CodeGen/X86/zext-fold.ll
38–39 ↗(On Diff #52139)

Actually, the sub probably won't be emitted on Mac and Windows, it will only appear on platforms with 16 byte stack alignment.

This revision is now accepted and ready to land.Mar 30 2016, 4:17 PM
This revision was automatically updated to reflect the committed changes.
joerg added inline comments.Mar 31 2016, 1:07 AM
llvm/trunk/test/CodeGen/X86/win32-seh-nested-finally.ll
56

Two things here for the updated patch. If the stack alignment requirement is 32bit only OR if the pushes have realigned the stack correctly (not sure if we care about the second part), the addls can be deferred to the end of the BB.
It's also cheaper to use a pop to some scratch register if available.

62

This and the next block is missing a pop still? Also, add looks strange and out of context for the rest of the purpose of the text case.

mkuper edited edge metadata.Mar 31 2016, 9:15 AM

Thanks a lot, Hans!
I apologize for not making this happen myself - never got to it, and then lost the ability to benchmark...

llvm/trunk/test/CodeGen/X86/win32-seh-nested-finally.ll
56

Regarding the last part - we already have code for that in X86FrameLowering::eliminateCallFramePseudoInstr(), but it's also currently only enabled for minsize.

hans added inline comments.Mar 31 2016, 9:49 AM
llvm/trunk/test/CodeGen/X86/win32-seh-nested-finally.ll
56

Using pop for more stack restores is PR26333.
Filed PR27165 for considering using pop for stack restore beyond -Os.

62

Adding more checks in r265027.

Are you saying add instead of pop looks out of place? See Michael's comment above. Or do you mean the "addl $12, %ebp" above?

hans added inline comments.Mar 31 2016, 9:56 AM
llvm/trunk/test/CodeGen/X86/win32-seh-nested-finally.ll
56

I mean, filed PR27165 for considering not restoring the stack between calls.