This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use push-pop for materializing small constants under 'minsize'
ClosedPublic

Authored by hans on Dec 15 2015, 5:28 PM.

Details

Summary

This is a follow-up to r255656 which uses push-pop for materializing small constants with 3 bytes instead of a 5-byte (or more) mov.

Since it's slower, let's put it under 'minsize'.

Please take a look.

Diff Detail

Event Timeline

hans updated this revision to Diff 42944.Dec 15 2015, 5:28 PM
hans retitled this revision from to [X86] Use push-pop for materializing small constants under 'minsize'.
hans updated this object.
hans added reviewers: mkuper, DavidKreitzer.
hans added a subscriber: llvm-commits.

Hi Hans,

Compiling int f() { return -1; } with x86_64-pc-win32 gives:

f:                                      # @f
.Ltmp0:
.seh_proc f
# BB#0:                                 # %entry
.Ltmp1:
        .seh_endprologue
        pushl   $-1
        popl    %eax
        retq
        .seh_handlerdata
        .text
.Ltmp2:
        .seh_endproc

I believe this is a violation of the x64 ABI because your program has stack pointer adjustments after the prologue without also having a frame pointer.

emaste added a subscriber: emaste.Dec 15 2015, 8:01 PM
hans added a comment.Dec 16 2015, 11:02 AM

I believe this is a violation of the x64 ABI because your program has stack pointer adjustments after the prologue without also having a frame pointer.

Thanks for pointing this out. Sounds like we should avoid the push/pop lowering for Win64 in functions without frame pointers, then.

hans updated this revision to Diff 43037.Dec 16 2015, 11:02 AM

Avoid the push/pop lowering for Win64 in functions without frame pointers.

DavidKreitzer added inline comments.Dec 16 2015, 2:28 PM
test/CodeGen/X86/materialize.ll
44

The reason for deferring the use of xor-inc/dec on 64-bit was the possible REX prefixes, right? You'll have the same problem with the pop, though it will only cost one byte rather than two. Is that one-byte difference the reason you allow push/pop currently but avoid xor-inc/dec?

46

Hmmm, pushl and popl are not valid instructions in 64-bit mode. They should instead be pushq and popq, and the CFA adjustments should be +/-8. It looks like this is just a problem in the test, as the code to generate them looks fine.

hans added inline comments.Dec 16 2015, 4:25 PM
test/CodeGen/X86/materialize.ll
44

Exactly, even with REX, push/pop is always a win.

46

Hmm, it's not just the test. It's selecting the MOV32ImmSExti8 instruction. I'll fix that.

And I now realize that the instruction selector doesn't realize it can use MOV64ImmSExti8 to get a 32-bit constant. I tried to add a pattern for that, but failed so far.. :-/

hans updated this revision to Diff 43085.Dec 16 2015, 4:25 PM

Make MOV32ImmSExti8 unavailable in 64-bit mode.

DavidKreitzer added inline comments.Dec 17 2015, 7:07 AM
test/CodeGen/X86/materialize.ll
46

Ah, okay, thanks for the explanation. So maybe the right thing to do is to allow MOV32ImmSExti8 to be selected on both 32-bit and 64-bit. Then in ExpandMOVImmSExti8, base the push/pop size not on the opcode but on the target. (Just maybe assert that MOV64ImmSExti8 doesn't occur on 32-bit.)

hans added inline comments.Dec 17 2015, 9:32 AM
test/CodeGen/X86/materialize.ll
46

That sounds good to me. The tricky part is that we'll have to widen the target register of the pop, but I think that's doable.

hans updated this revision to Diff 43148.Dec 17 2015, 9:32 AM

Allow MOV32ImmSExti8 on 64-bit targets too, and widen it when expanding.

DavidKreitzer added inline comments.Dec 17 2015, 11:37 AM
lib/Target/X86/X86InstrInfo.cpp
5310

This looks correct, but you could write it more simply by unifying 5311-5318 & 5327-5330. They are the same except for the lines that change the result register. You could use getX86SubSuperRegister(MIB->getOperand(0).getReg(), MVT::i64), which will work regardless of whether the original register is 32 or 64 bits.

hans added inline comments.Dec 17 2015, 11:54 AM
lib/Target/X86/X86InstrInfo.cpp
5310

I hadn't seen getX86SubSuperRegister() before, that seems very handy. Thanks!

hans updated this revision to Diff 43167.Dec 17 2015, 11:55 AM

Simplify the code with getX86SubSuperRegister().

DavidKreitzer edited edge metadata.Dec 17 2015, 2:15 PM

LGTM

lib/Target/X86/X86InstrInfo.cpp
5310

Much better, thanks!

hans added a comment.Dec 17 2015, 3:21 PM

Many thanks for the review!

This revision was automatically updated to reflect the committed changes.
lib/Target/X86/X86InstrInfo.td