This is an archive of the discontinued LLVM Phabricator instance.

[x86] improve CMOV codegen by pushing add into operands, part 2
ClosedPublic

Authored by spatel on Jul 23 2021, 9:59 AM.

Details

Summary

This is a minimum extension of D106607 to allow folding for 2 non-zero constants.

In the reduced test examples, we save 1 instruction by rolling the constants into LEA/ADD. In the motivating test from the bullet benchmark, we absorb both of the constant moves into add ops via LEA magic, so we reduce by 2 instructions.

Diff Detail

Event Timeline

spatel created this revision.Jul 23 2021, 9:59 AM
spatel requested review of this revision.Jul 23 2021, 9:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2021, 9:59 AM

Hmm, could you please add tests with 64-bit add immediate?

llvm/test/CodeGen/X86/add-cmov.ll
213–214

The other benefit is that the add can execute without waiting for the cmov.
https://godbolt.org/z/zz5EKhrKj

spatel updated this revision to Diff 361369.Jul 23 2021, 3:59 PM

Patch updated:

  1. Added tests for various larger-than-i32-immediate constants.
  2. Limited the fold if there's a large constant and the other constant is non-zero. It's possible that we can loosen that check, but I noticed that immediate 0x80000000 gets folded into subq $-2147483648, %rcx with an extra movq reg, reg, so that could increase the total number of instructions.

Hmm, could you please add tests with 64-bit add immediate?

Good catch - this transform is based on being able to use immediate operands to save instructions. If the constants are too big, then we should not do it.

lebedev.ri accepted this revision.Jul 24 2021, 12:36 AM

LG unless there are other comments.
Thanks.

llvm/test/CodeGen/X86/add-cmov.ll
158–159

Would transforming this into

%tval = add i64 %offset, 42
%fval = add i64 %tval, 2147483606
%r = select i1 %b, i64 %tval, i64 %fval

be a win? https://godbolt.org/z/rYjzePnr4

This revision is now accepted and ready to land.Jul 24 2021, 12:36 AM
RKSimon accepted this revision.Jul 24 2021, 1:36 AM

LGTM - cheers

RKSimon added inline comments.Jul 24 2021, 4:17 AM
llvm/test/CodeGen/X86/add-cmov.ll
251

I think this could be: https://llvm.godbolt.org/z/WMaPvfKKh

leaq (%rdx,%rdx,4), %rax
shlq $4, %rax
leaq 6(%rax), %rcx
testb $1, %dil
cmovneq %rax, %rcx
leaq 60(%rsi,%rcx), %rax
spatel marked an inline comment as done.Jul 25 2021, 6:38 AM
spatel added inline comments.
llvm/test/CodeGen/X86/add-cmov.ll
158–159

Yes, it seems likely to be a win if we can combine the adds sequentially like that. We should check how often we see large constants like this before making the transform more complicated though. I don't have stats on it, but seems unlikely?

This is the example that I was referring to when I updated the patch - we change the add to sub somewhere in X86SelDAGToDAG causing a seemingly extra reg copy later on:

movq	%rdi, %rax
movq	%rdi, %rcx
subq	$-2147483648, %rcx              ## imm = 0x80000000
addq	$42, %rax
testb	$1, %sil
cmoveq	%rcx, %rax
251

We would need to factor out the common ops in the expressions and move it after the select...not sure what layer to try that yet. I think that's the same as what you showed in https://llvm.org/PR51069, I'll put this into the bug report, so we have another example to look at.

This revision was landed with ongoing or failed builds.Jul 25 2021, 7:08 AM
This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.