This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
3
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
add-nsw-sext.ll

Differential D13757

[x86] promote 'add nsw' to a wider type to allow more combines
ClosedPublic

Authored by spatel on Oct 14 2015, 4:51 PM.

Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
majnemer
sanjoy
zansari
hfinkel

Commits

rGbbd524496ccf: [x86] promote 'add nsw' to a wider type to allow more combines
rL250560: [x86] promote 'add nsw' to a wider type to allow more combines

Summary

The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

But it wasn't clear to me where the fix(es) should go, so I tried several things: CodeGenPrepare, DAGCombiner, X86IselLowering, X86ISelDAGToDAG...and finally back to X86ISelLowering because that had the most effect for the least amount of patch. :)

I think the most basic problem (the first test case in the patch combines constants) could also be fixed in InstCombine, but it gets more complicated after that because we need to consider architecture and micro-architecture. For example, I don't think AArch64 sees any benefit from the more general transform because the ISA solves the sexting in hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

FWIW, I see no perf differences on test-suite with this change running on AMD Jaguar, and I see only very small code size improvements when building clang and the LLVM tools with the patched compiler. It would be great if someone could try this patch on a recent Intel model to see if it makes any difference. We may want to limit this to optimizing for size and/or modify FeatureSlowLEA if this is a bad change for Intel big cores.

Diff Detail

Event Timeline

spatel updated this revision to Diff 37416.Oct 14 2015, 4:51 PM

spatel retitled this revision from to [x86] promote 'add nsw' to a wider type to allow more combines.

spatel updated this object.

spatel added reviewers: hfinkel, chandlerc, qcolombet, zansari.

spatel added a subscriber: llvm-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptOct 14 2015, 4:51 PM

Hi Sanjay,

I’m fine with the patch as a temporary solution, assuming this is critical to have a fix sooner rather than later.

For longer term, this kind of transformations should be done in a generic place for all targets to benefit. Right now, I imagine you saw, we have the infrastructure to perform such promotion in CodeGenPrepare. Moreover, we already have dedicated model to expose more “combines” for addressing modes and extended loads.
Therefore, to me, it seems the right thing is to extend CodeGenPrepare to expose more of such combines. I am not saying that is easy :).

What do you think?

Cheers,
-Quentin

In D13757#268098, @qcolombet wrote:

Therefore, to me, it seems the right thing is to extend CodeGenPrepare to expose more of such combines. I am not saying that is easy :).

Hi Quentin - thanks for looking at this patch!

Yes, my head is still spinning from a debugger session that stepped through the CGP code that you added with:
http://reviews.llvm.org/rL200947

I did note some problems before I ran away. :)
Let me list those here for reference:

Something is going wrong when accounting for the scale factor in the address mode. The test cases for r200947 all use an i8*, so this was not exposed in that commit. I think that's the problem that is preventing the test case for PR20134 from getting optimized (because it uses i32*).
Even for some of the simple test cases in test/CodeGen/X86/codegen-prepare-addrmode-sext.ll, I saw that we were having to use the rollback mechanism. This seemed wrong to me, but I admit again that it was difficult to follow.
I didn't see a quick way to expand the CGP code to handle the general math that x86 can do with LEA as shown in the test cases here, and it wasn't clear to me that any other architecture would want to make that transform.

So yes, I would agree that we can and should do more to CGP or perhaps DAGCombiner to make this more general, but the simplicity and reach of the x86 isel change was too good for me to pass up. And so if we're not doing any x86 harm with this patch, I would like to get this checked in and continue investigating how to do these transforms more generally.

Just another +1 on Quentin's comment regarding it being nice to see fewer extends in DAGCombine to expose more opportunities.

Given the explained complexity with that, these changes seem reasonable, to me. Just made one small comment/comment.

Thanks,
Zia.

lib/Target/X86/X86ISelLowering.cpp
25702	How about also SUB instructions to catch the simple constant merging case, similar to the one in add_nsw_consts?

Hi Sanjay

the simplicity and reach of the x86 isel change was too good for me to pass up. And so if we're not doing any x86 harm with this patch, I would like to get this checked in and continue investigating how to do these transforms more generally.

Sounds good to me!

LGTM once you've addressed Zia's concern.

Cheers,
-Quentin

spatel added reviewers: sanjoy, majnemer.Oct 16 2015, 9:16 AM

spatel added inline comments.

lib/Target/X86/X86ISelLowering.cpp
25702	A 'sub with constant' is always canonicalized to 'add with -constant': // fold (sub x, c) -> (add x, -c) http://llvm.org/docs/doxygen/html/DAGCombiner_8cpp_source.html#l01904 So I was about to say that we don't have to worry about the 'sub' case...except that the above transform doesn't propagate the 'nsw'. So we still don't have to worry about the 'sub' case in this code. :) I noted the lack of nsw propagation bug in D12095 (and Hal warned about the consequences), but I didn't do anything about it...and now it's crawled out from under the rug. Worse still, I didn't think about adding an 'nsw' to the wide 'add' that I'm creating in this patch. In my defense, getting the flags right in IR transforms has been an ongoing challenge, so at least we're conservatively correct by dropping the flags...I hope. cc'ing Sanjoy and David for their nsw expertise. Unless there's a correctness bug here, I think we should get this patch checked in, and then we'll have to start working on propagating nsw/nuw/exact flags all over the DAG.

spatel mentioned this in D13740: Catch combine opportunities for redundant imuls.Oct 16 2015, 9:36 AM

Thanks for the follow up and explanation, Sanjay. lgtm..

Thanks,
Zia.

qcolombet accepted this revision.Oct 16 2015, 10:44 AM

qcolombet edited edge metadata.

This revision is now accepted and ready to land.Oct 16 2015, 10:44 AM

majnemer added inline comments.Oct 16 2015, 11:08 AM

lib/Target/X86/X86ISelLowering.cpp
25702	It should be fine to transform `sext(add nsw %X, %Y)` into `add nsw (sext %X), (sext %Y)`.

Closed by commit rL250560: [x86] promote 'add nsw' to a wider type to allow more combines (authored by spatel). · Explain WhyOct 16 2015, 3:16 PM

This revision was automatically updated to reflect the committed changes.

Thanks, all.

Note: I added the nsw flag to the wider add created here since that looks correct, but I haven't found a way to expose that difference in a test case or even the debug output yet.

apilipenko mentioned this in D23359: [x86] X86ISelLowering zext(add_nuw(x, C)) --> add(zext(x), C_zext).Aug 10 2016, 9:44 AM

spatel mentioned this in rL355118: [InstCombine] fold adds of constants separated by sext/zext.Feb 28 2019, 11:06 AM

spatel mentioned this in rG4a47f5f55071: [InstCombine] fold adds of constants separated by sext/zext.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

49 lines

test/

CodeGen/

X86/

add-nsw-sext.ll

38 lines

Diff 37416

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 25,669 Lines • ▼ Show 20 Lines	if (N00.getValueType() == MVT::v4i32 && ExtraVT.getSizeInBits() < 128) {
SDValue Tmp = DAG.getNode(ISD::SIGN_EXTEND_INREG, dl, MVT::v4i32,		SDValue Tmp = DAG.getNode(ISD::SIGN_EXTEND_INREG, dl, MVT::v4i32,
N00, N1);		N00, N1);
return DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v4i64, Tmp);		return DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v4i64, Tmp);
}		}
}		}
return SDValue();		return SDValue();
}		}

		/// sext(add_nsw(x, C)) --> add(sext(x), C_sext)
		/// Promoting a sign extension ahead of an 'add nsw' exposes opportunities
		/// to combine math ops, use an LEA, or use a complex addressing mode. This can
		/// eliminate extend, add, and shift instructions.
		static SDValue promoteSextBeforeAddNSW(SDNode *Sext, SelectionDAG &DAG,
		const X86Subtarget *Subtarget) {
		// TODO: This should be valid for other integer types.
		EVT VT = Sext->getValueType(0);
		if (VT != MVT::i64)
		return SDValue();

		// We need an 'add nsw' feeding into the 'sext'.
		SDValue Add = Sext->getOperand(0);
		if (Add.getOpcode() != ISD::ADD \|\| !Add->getFlags()->hasNoSignedWrap())
		return SDValue();

		// Having a constant operand to the 'add' ensures that we are not increasing
		// the instruction count because the constant is extended for free below.
		// A constant operand can also become the displacement field of an LEA.
		auto *AddOp1 = dyn_cast<ConstantSDNode>(Add.getOperand(1));
		if (!AddOp1)
		return SDValue();

		// Don't make the 'add' bigger if there's no hope of combining it with some
		// other 'add' or 'shl' instruction.
		zansariUnsubmitted Not Done Reply Inline Actions How about also SUB instructions to catch the simple constant merging case, similar to the one in add_nsw_consts? zansari: How about also SUB instructions to catch the simple constant merging case, similar to the one…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions A 'sub with constant' is always canonicalized to 'add with -constant': // fold (sub x, c) -> (add x, -c) http://llvm.org/docs/doxygen/html/DAGCombiner_8cpp_source.html#l01904 So I was about to say that we don't have to worry about the 'sub' case...except that the above transform doesn't propagate the 'nsw'. So we still don't have to worry about the 'sub' case in this code. :) I noted the lack of nsw propagation bug in D12095 (and Hal warned about the consequences), but I didn't do anything about it...and now it's crawled out from under the rug. Worse still, I didn't think about adding an 'nsw' to the wide 'add' that I'm creating in this patch. In my defense, getting the flags right in IR transforms has been an ongoing challenge, so at least we're conservatively correct by dropping the flags...I hope. cc'ing Sanjoy and David for their nsw expertise. Unless there's a correctness bug here, I think we should get this patch checked in, and then we'll have to start working on propagating nsw/nuw/exact flags all over the DAG. spatel: A 'sub with constant' is always canonicalized to 'add with -constant': // fold (sub x, c) ->…
		majnemerUnsubmitted Not Done Reply Inline Actions It should be fine to transform `sext(add nsw %X, %Y)` into `add nsw (sext %X), (sext %Y)`. majnemer: It should be fine to transform `sext(add nsw %X, %Y)` into `add nsw (sext %X), (sext %Y)`.
		// TODO: It may be profitable to generate simpler LEA instructions in place
		// of single 'add' instructions, but the cost model for selecting an LEA
		// currently has a high threshold.
		bool HasLEAPotential = false;
		for (auto *User : Sext->uses()) {
		if (User->getOpcode() == ISD::ADD \|\| User->getOpcode() == ISD::SHL) {
		HasLEAPotential = true;
		break;
		}
		}
		if (!HasLEAPotential)
		return SDValue();

		// Everything looks good, so pull the 'sext' ahead of the 'add'.
		int64_t AddConstant = AddOp1->getSExtValue();
		SDValue AddOp0 = Add.getOperand(0);
		SDValue NewSext = DAG.getNode(ISD::SIGN_EXTEND, SDLoc(Sext), VT, AddOp0);
		SDValue NewConstant = DAG.getConstant(AddConstant, SDLoc(Add), VT);
		return DAG.getNode(ISD::ADD, SDLoc(Add), VT, NewSext, NewConstant);
		}

static SDValue PerformSExtCombine(SDNode *N, SelectionDAG &DAG,		static SDValue PerformSExtCombine(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget *Subtarget) {		const X86Subtarget *Subtarget) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
EVT InVT = N0.getValueType();		EVT InVT = N0.getValueType();
EVT InSVT = InVT.getScalarType();		EVT InSVT = InVT.getScalarType();
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	if (!Subtarget->hasInt256() && !(VT.getSizeInBits() % 128) &&
return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Opnds);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Opnds);
}		}
}		}

if (Subtarget->hasAVX() && VT.isVector() && VT.getSizeInBits() == 256)		if (Subtarget->hasAVX() && VT.isVector() && VT.getSizeInBits() == 256)
if (SDValue R = WidenMaskArithmetic(N, DAG, DCI, Subtarget))		if (SDValue R = WidenMaskArithmetic(N, DAG, DCI, Subtarget))
return R;		return R;

		if (SDValue NewAdd = promoteSextBeforeAddNSW(N, DAG, Subtarget))
		return NewAdd;

return SDValue();		return SDValue();
}		}

static SDValue PerformFMACombine(SDNode *N, SelectionDAG &DAG,		static SDValue PerformFMACombine(SDNode *N, SelectionDAG &DAG,
const X86Subtarget* Subtarget) {		const X86Subtarget* Subtarget) {
SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

▲ Show 20 Lines • Show All 1,476 Lines • Show Last 20 Lines

test/CodeGen/X86/add-nsw-sext.ll

; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s		; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s

; The fundamental problem: an add separated from other arithmetic by a sext can't		; The fundamental problem: an add separated from other arithmetic by a sext can't
; be combined with the later instructions. However, if the first add is 'nsw',		; be combined with the later instructions. However, if the first add is 'nsw',
; then we can promote the sext ahead of that add to allow optimizations.		; then we can promote the sext ahead of that add to allow optimizations.

define i64 @add_nsw_consts(i32 %i) {		define i64 @add_nsw_consts(i32 %i) {
; CHECK-LABEL: add_nsw_consts:		; CHECK-LABEL: add_nsw_consts:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: addq $7, %rax		; CHECK-NEXT: addq $12, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, 5		%add = add nsw i32 %i, 5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = add i64 %ext, 7		%idx = add i64 %ext, 7
ret i64 %idx		ret i64 %idx
}		}

; An x86 bonus: If we promote the sext ahead of the 'add nsw',		; An x86 bonus: If we promote the sext ahead of the 'add nsw',
; we allow LEA formation and eliminate an add instruction.		; we allow LEA formation and eliminate an add instruction.

define i64 @add_nsw_sext_add(i32 %i, i64 %x) {		define i64 @add_nsw_sext_add(i32 %i, i64 %x) {
; CHECK-LABEL: add_nsw_sext_add:		; CHECK-LABEL: add_nsw_sext_add:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: addq %rsi, %rax		; CHECK-NEXT: leaq 5(%rax,%rsi), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, 5		%add = add nsw i32 %i, 5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = add i64 %x, %ext		%idx = add i64 %x, %ext
ret i64 %idx		ret i64 %idx
}		}

; Throw in a scale (left shift) because an LEA can do that too.		; Throw in a scale (left shift) because an LEA can do that too.
; Use a negative constant (LEA displacement) to verify that's handled correctly.		; Use a negative constant (LEA displacement) to verify that's handled correctly.

define i64 @add_nsw_sext_lsh_add(i32 %i, i64 %x) {		define i64 @add_nsw_sext_lsh_add(i32 %i, i64 %x) {
; CHECK-LABEL: add_nsw_sext_lsh_add:		; CHECK-LABEL: add_nsw_sext_lsh_add:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $-5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: leaq (%rsi,%rax,8), %rax		; CHECK-NEXT: leaq -40(%rsi,%rax,8), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, -5		%add = add nsw i32 %i, -5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%shl = shl i64 %ext, 3		%shl = shl i64 %ext, 3
%idx = add i64 %x, %shl		%idx = add i64 %x, %shl
ret i64 %idx		ret i64 %idx
}		}
Show All 13 Lines	; CHECK-NEXT: retq
ret i64 %ext		ret i64 %ext
}		}

; The typical use case: a 64-bit system where an 'int' is used as an index into an array.		; The typical use case: a 64-bit system where an 'int' is used as an index into an array.

define i8* @gep8(i32 %i, i8* %x) {		define i8* @gep8(i32 %i, i8* %x) {
; CHECK-LABEL: gep8:		; CHECK-LABEL: gep8:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: addq %rsi, %rax		; CHECK-NEXT: leaq 5(%rax,%rsi), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, 5		%add = add nsw i32 %i, 5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = getelementptr i8, i8* %x, i64 %ext		%idx = getelementptr i8, i8* %x, i64 %ext
ret i8* %idx		ret i8* %idx
}		}

define i16* @gep16(i32 %i, i16* %x) {		define i16* @gep16(i32 %i, i16* %x) {
; CHECK-LABEL: gep16:		; CHECK-LABEL: gep16:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $-5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: leaq (%rsi,%rax,2), %rax		; CHECK-NEXT: leaq -10(%rsi,%rax,2), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, -5		%add = add nsw i32 %i, -5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = getelementptr i16, i16* %x, i64 %ext		%idx = getelementptr i16, i16* %x, i64 %ext
ret i16* %idx		ret i16* %idx
}		}

define i32* @gep32(i32 %i, i32* %x) {		define i32* @gep32(i32 %i, i32* %x) {
; CHECK-LABEL: gep32:		; CHECK-LABEL: gep32:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: leaq (%rsi,%rax,4), %rax		; CHECK-NEXT: leaq 20(%rsi,%rax,4), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, 5		%add = add nsw i32 %i, 5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = getelementptr i32, i32* %x, i64 %ext		%idx = getelementptr i32, i32* %x, i64 %ext
ret i32* %idx		ret i32* %idx
}		}

define i64* @gep64(i32 %i, i64* %x) {		define i64* @gep64(i32 %i, i64* %x) {
; CHECK-LABEL: gep64:		; CHECK-LABEL: gep64:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $-5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: leaq (%rsi,%rax,8), %rax		; CHECK-NEXT: leaq -40(%rsi,%rax,8), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, -5		%add = add nsw i32 %i, -5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = getelementptr i64, i64* %x, i64 %ext		%idx = getelementptr i64, i64* %x, i64 %ext
ret i64* %idx		ret i64* %idx
}		}

; LEA can't scale by 16, but the adds can still be combined into an LEA.		; LEA can't scale by 16, but the adds can still be combined into an LEA.

define i128* @gep128(i32 %i, i128* %x) {		define i128* @gep128(i32 %i, i128* %x) {
; CHECK-LABEL: gep128:		; CHECK-LABEL: gep128:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: addl $5, %edi
; CHECK-NEXT: movslq %edi, %rax		; CHECK-NEXT: movslq %edi, %rax
; CHECK-NEXT: shlq $4, %rax		; CHECK-NEXT: shlq $4, %rax
; CHECK-NEXT: addq %rsi, %rax		; CHECK-NEXT: leaq 80(%rax,%rsi), %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add = add nsw i32 %i, 5		%add = add nsw i32 %i, 5
%ext = sext i32 %add to i64		%ext = sext i32 %add to i64
%idx = getelementptr i128, i128* %x, i64 %ext		%idx = getelementptr i128, i128* %x, i64 %ext
ret i128* %idx		ret i128* %idx
}		}

; A bigger win can be achieved when there is more than one use of the		; A bigger win can be achieved when there is more than one use of the
; sign extended value. In this case, we can eliminate sign extension		; sign extended value. In this case, we can eliminate sign extension
; instructions plus use more efficient addressing modes for memory ops.		; instructions plus use more efficient addressing modes for memory ops.

define void @PR20134(i32* %a, i32 %i) {		define void @PR20134(i32* %a, i32 %i) {
; CHECK-LABEL: PR20134:		; CHECK-LABEL: PR20134:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: leal 1(%rsi), %eax		; CHECK-NEXT: movslq %esi, %rax
; CHECK-NEXT: cltq		; CHECK-NEXT: movl 4(%rdi,%rax,4), %ecx
; CHECK-NEXT: movl (%rdi,%rax,4), %eax		; CHECK-NEXT: addl 8(%rdi,%rax,4), %ecx
; CHECK-NEXT: leal 2(%rsi), %ecx		; CHECK-NEXT: movl %ecx, (%rdi,%rax,4)
; CHECK-NEXT: movslq %ecx, %rcx
; CHECK-NEXT: addl (%rdi,%rcx,4), %eax
; CHECK-NEXT: movslq %esi, %rcx
; CHECK-NEXT: movl %eax, (%rdi,%rcx,4)
; CHECK-NEXT: retq		; CHECK-NEXT: retq

%add1 = add nsw i32 %i, 1		%add1 = add nsw i32 %i, 1
%idx1 = sext i32 %add1 to i64		%idx1 = sext i32 %add1 to i64
%gep1 = getelementptr i32, i32* %a, i64 %idx1		%gep1 = getelementptr i32, i32* %a, i64 %idx1
%load1 = load i32, i32* %gep1, align 4		%load1 = load i32, i32* %gep1, align 4

%add2 = add nsw i32 %i, 2		%add2 = add nsw i32 %i, 2
Show All 11 Lines