This is an archive of the discontinued LLVM Phabricator instance.

X86: Fix X86CallFrameOptimization to search for the COPY StackPointer
ClosedPublic

Authored by zvi on Oct 10 2017, 6:57 AM.

Details

Summary

SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.

The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.

Fixes pr34903

Event Timeline

zvi created this revision.Oct 10 2017, 6:57 AM
zvi added inline comments.Oct 10 2017, 7:20 AM
test/CodeGen/X86/movtopush.ll
231

Overall code size reduced by 15 bytes

297

The check's don't show the issue at hand, so i will explain:
With the fix, the pass now fires and replacing mov's with push's results in the following

  1. Overall code size reduced by ~60 bytes
  2. There is an extra overhead of resetting (with an add instruction) the frame after each call (but we still win overall in code size)
314

Overall code size reduced by 24 bytes

rnk added a subscriber: rnk.Oct 10 2017, 6:28 PM

Thanks, looks good to me!

rnk accepted this revision.Oct 23 2017, 5:37 PM

I've already said I think this looks good, but I wasn't listed as a reviewer. Were you waiting for me or someone else on the reviewer list?

This revision is now accepted and ready to land.Oct 23 2017, 5:37 PM
DavidKreitzer edited edge metadata.Oct 23 2017, 7:29 PM

Just one minor suggestion. Otherwise, this LGTM also.

lib/Target/X86/X86CallFrameOptimization.cpp
382

The indentation is off here, which is particularly confusing since the closing brace is aligned with the 'for'.

This revision was automatically updated to reflect the committed changes.
zvi added inline comments.Oct 24 2017, 12:39 AM
lib/Target/X86/X86CallFrameOptimization.cpp
382

Good catch. Will fix at commit-time.

DavidKreitzer added inline comments.Oct 24 2017, 8:05 AM
llvm/trunk/lib/Target/X86/X86CallFrameOptimization.cpp
393 ↗(On Diff #120006)

I think there is a potential problem with this new loop structure. If the loop terminates by reaching MBB.end(), the code will attempt to advance the iterator beyond MBB.end(). Having the loop terminate by reaching MBB.end() is probably not an expected situation, but it seems like we should fix this regardless (unless I'm misunderstanding something).

zvi added inline comments.Oct 24 2017, 3:25 PM
llvm/trunk/lib/Target/X86/X86CallFrameOptimization.cpp
393 ↗(On Diff #120006)

In general, your concern is valid as it is possible that the frame-setup,call,frame-destroy instructions be located in exclusive BB's. FWIW, MachineVerifier::verifyStackFrame checks for a weaker requirement.
However, in this pass, X86CallFrameOptimization::isLegal requires that the entire call chain be located in a single BB. See the comment starting with "You would expect straight-line code...".
I came across this dilemma when adding the loop in line 375. That code assumes we meet the call instruction before end of the BB, which is only correct under the assumption of all-in-same-BB
I made a note to do some clean-up on this pass of removing overly-conservative checks of reaching end of BB. And of course, we should consider improving this pass to handle the multi-BB call chains.