This is an archive of the discontinued LLVM Phabricator instance.

[x86] promote 'add nsw' to a wider type to allow more combines
ClosedPublic

Authored by spatel on Oct 14 2015, 4:51 PM.

Details

Summary

The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

But it wasn't clear to me where the fix(es) should go, so I tried several things: CodeGenPrepare, DAGCombiner, X86IselLowering, X86ISelDAGToDAG...and finally back to X86ISelLowering because that had the most effect for the least amount of patch. :)

I think the most basic problem (the first test case in the patch combines constants) could also be fixed in InstCombine, but it gets more complicated after that because we need to consider architecture and micro-architecture. For example, I don't think AArch64 sees any benefit from the more general transform because the ISA solves the sexting in hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

FWIW, I see no perf differences on test-suite with this change running on AMD Jaguar, and I see only very small code size improvements when building clang and the LLVM tools with the patched compiler. It would be great if someone could try this patch on a recent Intel model to see if it makes any difference. We may want to limit this to optimizing for size and/or modify FeatureSlowLEA if this is a bad change for Intel big cores.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 37416.Oct 14 2015, 4:51 PM
spatel retitled this revision from to [x86] promote 'add nsw' to a wider type to allow more combines.
spatel updated this object.
spatel added a subscriber: llvm-commits.
qcolombet edited edge metadata.Oct 15 2015, 10:46 AM

Hi Sanjay,

I’m fine with the patch as a temporary solution, assuming this is critical to have a fix sooner rather than later.

For longer term, this kind of transformations should be done in a generic place for all targets to benefit. Right now, I imagine you saw, we have the infrastructure to perform such promotion in CodeGenPrepare. Moreover, we already have dedicated model to expose more “combines” for addressing modes and extended loads.
Therefore, to me, it seems the right thing is to extend CodeGenPrepare to expose more of such combines. I am not saying that is easy :).

What do you think?

Cheers,
-Quentin

Therefore, to me, it seems the right thing is to extend CodeGenPrepare to expose more of such combines. I am not saying that is easy :).

Hi Quentin - thanks for looking at this patch!

Yes, my head is still spinning from a debugger session that stepped through the CGP code that you added with:
http://reviews.llvm.org/rL200947

I did note some problems before I ran away. :)
Let me list those here for reference:

  1. Something is going wrong when accounting for the scale factor in the address mode. The test cases for r200947 all use an i8*, so this was not exposed in that commit. I think that's the problem that is preventing the test case for PR20134 from getting optimized (because it uses i32*).
  2. Even for some of the simple test cases in test/CodeGen/X86/codegen-prepare-addrmode-sext.ll, I saw that we were having to use the rollback mechanism. This seemed wrong to me, but I admit again that it was difficult to follow.
  3. I didn't see a quick way to expand the CGP code to handle the general math that x86 can do with LEA as shown in the test cases here, and it wasn't clear to me that any other architecture would want to make that transform.

So yes, I would agree that we can and should do more to CGP or perhaps DAGCombiner to make this more general, but the simplicity and reach of the x86 isel change was too good for me to pass up. And so if we're not doing any x86 harm with this patch, I would like to get this checked in and continue investigating how to do these transforms more generally.

zansari edited edge metadata.Oct 15 2015, 12:17 PM

Just another +1 on Quentin's comment regarding it being nice to see fewer extends in DAGCombine to expose more opportunities.

Given the explained complexity with that, these changes seem reasonable, to me. Just made one small comment/comment.

Thanks,
Zia.

lib/Target/X86/X86ISelLowering.cpp
25702 ↗(On Diff #37416)

How about also SUB instructions to catch the simple constant merging case, similar to the one in add_nsw_consts?

Hi Sanjay

the simplicity and reach of the x86 isel change was too good for me to pass up. And so if we're not doing any x86 harm with this patch, I would like to get this checked in and continue investigating how to do these transforms more generally.

Sounds good to me!

LGTM once you've addressed Zia's concern.

Cheers,
-Quentin

spatel added inline comments.
lib/Target/X86/X86ISelLowering.cpp
25702 ↗(On Diff #37416)

A 'sub with constant' is always canonicalized to 'add with -constant':

// fold (sub x, c) -> (add x, -c)

http://llvm.org/docs/doxygen/html/DAGCombiner_8cpp_source.html#l01904

So I was about to say that we don't have to worry about the 'sub' case...except that the above transform doesn't propagate the 'nsw'. So we still don't have to worry about the 'sub' case in this code. :)

I noted the lack of nsw propagation bug in D12095 (and Hal warned about the consequences), but I didn't do anything about it...and now it's crawled out from under the rug. Worse still, I didn't think about adding an 'nsw' to the wide 'add' that I'm creating in this patch.

In my defense, getting the flags right in IR transforms has been an ongoing challenge, so at least we're conservatively correct by dropping the flags...I hope. cc'ing Sanjoy and David for their nsw expertise.

Unless there's a correctness bug here, I think we should get this patch checked in, and then we'll have to start working on propagating nsw/nuw/exact flags all over the DAG.

Thanks for the follow up and explanation, Sanjay. lgtm..

Thanks,
Zia.

qcolombet accepted this revision.Oct 16 2015, 10:44 AM
qcolombet edited edge metadata.
This revision is now accepted and ready to land.Oct 16 2015, 10:44 AM
majnemer added inline comments.Oct 16 2015, 11:08 AM
lib/Target/X86/X86ISelLowering.cpp
25702 ↗(On Diff #37416)

It should be fine to transform sext(add nsw %X, %Y) into add nsw (sext %X), (sext %Y).

This revision was automatically updated to reflect the committed changes.

Thanks, all.

Note: I added the nsw flag to the wider add created here since that looks correct, but I haven't found a way to expose that difference in a test case or even the debug output yet.