This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Do MergeConsecutiveStores again before Instruction Selection
ClosedPublic

Authored by niravd on May 30 2017, 6:38 AM.

Download Raw Diff

Details

Reviewers

jyknight
hfinkel
efriedma
rnk
jmolloy
RKSimon

Commits

rGdb77e57ea86d: [DAG] Do MergeConsecutiveStores again before Instruction Selection
rL319036: [DAG] Do MergeConsecutiveStores again before Instruction Selection

Summary

Enable by default post-legalization store merging to non-X86 machines for all targets to allow merging stores from lowered intrinsics / calls.

Pre-legalization store merging cannot yet be disabled as nodes with custom lowering may be lowered during legalization obscuring some merge candidates.

Diff Detail

Repository: rL LLVM

Event Timeline

niravd created this revision.May 30 2017, 6:38 AM

Herald added a subscriber: javed.absar. · View Herald TranscriptMay 30 2017, 6:38 AM

niravd added a parent revision: D33518: [AArch64] Fix stores of zero values.May 30 2017, 6:42 AM

5% of time in DAGCombine, or 5% total? 5% total is a lot for an optimization which triggers relatively rarely.

It's about a 5% time increase for the sum of DAG Combine phases on the my bad test cases for store merge (large basic blocks with a large number of stores which are offset from the same base, but non-mergeable stores). It looks closer to 2% on the total. In the majority of cases, the effect is negligible.

A simple caching check on the PersistantID could remove most of the redundant work. I had plans on looking at a more efficient version of this pass of this once the pre-legal type merge was removed and the nodes are more stable.

Okay, in that case the compile-time sounds fine.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13170 ↗	(On Diff #100701)	Can you point to an example testcase where the run before legalization matters?
lib/Target/AArch64/AArch64ISelLowering.cpp
9425 ↗	(On Diff #100701)	How is this related to the other change?
test/CodeGen/X86/bigstructret.ll
24 ↗	(On Diff #100701)	We should be able to do something more clever here... but I guess it's not important.

niravd added inline comments.Jun 1 2017, 7:08 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13170 ↗	(On Diff #100701)	The most obvious is: merge_vec_element_store and merge_vec_extract_stores in CodeGen/X86/MergeConsecutiveStores have issues related to bitcasting simplification. From what I've seen this is because we do a bticasting op for a vector with index 0 that we don't with others. Related issues that also need addressing before killing the pre-legalization merge. These seem simple enough: redundant_stores_merging and overlapping_stores_merging in CodeGen/X86/stores-merging.ll fails to merge because BaseIndexOffset cannot look through the X86ISD:Wrappers nor extract offset from TargetGlobals preventing merge otherwise obvious merges. The later is also true for CodeGen/AMDGPU/merge-stores.ll truncated stores are neither generated or used as valid inputs for store merge. CodeGen/AArch64/merge-store.ll adds extra nodes due to bitcasting
lib/Target/AArch64/AArch64ISelLowering.cpp
9425 ↗	(On Diff #100701)	Amongst other things split storess replaces vector stores of zero/scalars into appropriate sizes so that as Machine instructions we can create paired memory operations which may be cheaper. This should be run whenever we create a new store larger store. Concrete examples of the effects can be seen in the CodeGen/AArch64/ldst-opt.ll test. This patch nominally depends on D33518 which fixes untested cases caught by the recent store merge optimizations.

efriedma added inline comments.Jun 14 2017, 4:11 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13170 ↗	(On Diff #100701)	truncated stores are neither generated or used as valid inputs for store merge. Oh, so it completely breaks everything that isn't x86. :) Could you make before/after/both an option, so it's easier to check the impact if we do run into issues? If we really just want to run this after legalization, it's probably a good idea to flip the switch as soon as possible, even if it causes minor regressions.

RKSimon resigned from this revision.Oct 29 2017, 5:14 AM

Resurrecting after long hiatus.

Herald added subscribers: nhaehnle, sdardis. · View Herald TranscriptNov 6 2017, 1:22 PM

The AArch64 test changes look fine.

If the Aarch64 tests look good, I think this should be landable. The only remaining test that is more than a simple merge is the Mips/cconv/vector.ll test where the store merging allows the value stored on the stack to be forwarded to a now matching load and the stores and loads excised.

In D33675#918302, @efriedma wrote:

The AArch64 test changes look fine.

Ping. I think we're all set here. Can I get an LGTM?

lgtm

This revision is now accepted and ready to land.Nov 17 2017, 1:38 PM

Closed by commit rL319036: [DAG] Do MergeConsecutiveStores again before Instruction Selection (authored by niravd). · Explain WhyNov 27 2017, 7:31 AM

This revision was automatically updated to reflect the committed changes.

Hi Nirav,

Could you please revert the changes? They affected Arm targets (Thumb2 code).
The following sequence of stores:

MOVS     r0,#0xe5
STRB     r0,[r6,#0x1e5]
MOVS     r0,#0xe4
STRB     r0,[r6,#0x1e4]
MOVS     r0,#0xe6
STRB     r0,[r6,#0x1e6]
MOVS     r0,#0xe7
STRB     r0,[r6,#0x1e7]

is optimised into

MVN      r0,#0x1b
STR      r0,[r6,#0x1e4]

causing incorrect data to be written.

We are working on a reproducer.

Thanks,
Evgeny Astigeevich
The Arm Compiler Optimisation team

A reproducer:

test.ll11 KBDownload

$ cat test.ll
...
  %v304 = getelementptr inbounds i8, i8* %v50, i32 508
  store i8 -4, i8* %v304, align 1
  %v305 = getelementptr inbounds i8, i8* %v50, i32 509
  store i8 -3, i8* %v305, align 1
  %v306 = getelementptr inbounds i8, i8* %v50, i32 510
  store i8 -2, i8* %v306, align 1
  %v307 = getelementptr inbounds i8, i8* %v50, i32 511
  store i8 -1, i8* %v307, align 1
...
$ llc -O3 -filetype=asm -o test.s test.ll
$ cat test.s
...
        movs    r1, #251
        strb.w  r1, [r0, #507]
        mvn     r1, #3 <========= HERE the problem: -4, -1, -1, -1 is written instead of -4, -3, -2, -1
        str.w   r1, [r0, #508]
        bx      lr
.Lfunc_end0:
        .size   test, .Lfunc_end0-test
        .cantunwind
        .fnend
...

These changes caused Clang to crash when it compiled spec2006 403.gcc for AArch64. I am working on a reproducer.

These changes caused failures of AArch64 NEON Emperor tests.

spatel mentioned this in D40790: DAGCombiner bugfix in MergeStoresOfConstantsOrVecElts().Dec 4 2017, 3:33 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

TargetLowering.h

2 lines

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

2 lines

test/

CodeGen/

AArch64/

arm64-complex-ret.ll

3 lines

arm64-narrow-st-merge.ll

4 lines

arm64-variadic-aapcs.ll

16 lines

tailcall-explicit-sret.ll

14 lines

tailcall-implicit-sret.ll

12 lines

AMDGPU/

amdgpu.private-memory.ll

3 lines

ARM/

fp16-promote.ll

50 lines

BPF/

undef.ll

11 lines

Mips/

cconv/

vector.ll

30 lines

llvm-ir/

extractelement.ll

3 lines

SystemZ/

fp-move-13.ll

6 lines

Diff 124379

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	virtual bool storeOfVectorConstantIsCheap(EVT MemVT,
unsigned NumElem,		unsigned NumElem,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
return false;		return false;
}		}

/// Allow store merging after legalization in addition to before legalization.		/// Allow store merging after legalization in addition to before legalization.
/// This may catch stores that do not exist earlier (eg, stores created from		/// This may catch stores that do not exist earlier (eg, stores created from
/// intrinsics).		/// intrinsics).
virtual bool mergeStoresAfterLegalization() const { return false; }		virtual bool mergeStoresAfterLegalization() const { return true; }

/// Returns if it's reasonable to merge stores to MemVT size.		/// Returns if it's reasonable to merge stores to MemVT size.
virtual bool canMergeStoresTo(unsigned AS, EVT MemVT,		virtual bool canMergeStoresTo(unsigned AS, EVT MemVT,
const SelectionDAG &DAG) const {		const SelectionDAG &DAG) const {
return true;		return true;
}		}

/// \brief Return true if it is cheap to speculate a call to intrinsic cttz.		/// \brief Return true if it is cheap to speculate a call to intrinsic cttz.
▲ Show 20 Lines • Show All 3,091 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,556 Lines • ▼ Show 20 Lines	if (IndexNotInserted.any())
return SDValue();		return SDValue();

return splitStoreSplat(DAG, St, SplatVal, NumVecElts);		return splitStoreSplat(DAG, St, SplatVal, NumVecElts);
}		}

static SDValue splitStores(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,		static SDValue splitStores(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG,		SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
if (!DCI.isBeforeLegalize())
return SDValue();

StoreSDNode *S = cast<StoreSDNode>(N);		StoreSDNode *S = cast<StoreSDNode>(N);
if (S->isVolatile() \|\| S->isIndexed())		if (S->isVolatile() \|\| S->isIndexed())
return SDValue();		return SDValue();

SDValue StVal = S->getValue();		SDValue StVal = S->getValue();
EVT VT = StVal.getValueType();		EVT VT = StVal.getValueType();
if (!VT.isVector())		if (!VT.isVector())
▲ Show 20 Lines • Show All 1,413 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-complex-ret.ll

	; RUN: llc -mtriple=arm64-eabi -o - %s \| FileCheck %s			; RUN: llc -mtriple=arm64-eabi -o - %s \| FileCheck %s

	define { i192, i192, i21, i192 } @foo(i192) {			define { i192, i192, i21, i192 } @foo(i192) {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: stp xzr, xzr, [x8]			; CHECK-DAG: str xzr, [x8, #16]
				; CHECK-DAG: str q0, [x8]
	ret { i192, i192, i21, i192 } {i192 0, i192 1, i21 2, i192 3}			ret { i192, i192, i21, i192 } {i192 0, i192 1, i21 2, i192 3}
	}			}

llvm/trunk/test/CodeGen/AArch64/arm64-narrow-st-merge.ll

Show All 13 Lines	entry:
%add = add nsw i32 %n, 1		%add = add nsw i32 %n, 1
%idxprom1 = sext i32 %add to i64		%idxprom1 = sext i32 %add to i64
%arrayidx2 = getelementptr inbounds i16, i16* %P, i64 %idxprom1		%arrayidx2 = getelementptr inbounds i16, i16* %P, i64 %idxprom1
store i16 0, i16* %arrayidx2		store i16 0, i16* %arrayidx2
ret void		ret void
}		}

; CHECK-LABEL: Strh_zero_4		; CHECK-LABEL: Strh_zero_4
; CHECK: stp wzr, wzr		; CHECK: str xzr
; CHECK-STRICT-LABEL: Strh_zero_4		; CHECK-STRICT-LABEL: Strh_zero_4
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
define void @Strh_zero_4(i16* nocapture %P, i32 %n) {		define void @Strh_zero_4(i16* nocapture %P, i32 %n) {
entry:		entry:
%idxprom = sext i32 %n to i64		%idxprom = sext i32 %n to i64
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	entry:
%sub1 = add nsw i32 %n, -3		%sub1 = add nsw i32 %n, -3
%idxprom2 = sext i32 %sub1 to i64		%idxprom2 = sext i32 %sub1 to i64
%arrayidx3 = getelementptr inbounds i16, i16* %P, i64 %idxprom2		%arrayidx3 = getelementptr inbounds i16, i16* %P, i64 %idxprom2
store i16 0, i16* %arrayidx3		store i16 0, i16* %arrayidx3
ret void		ret void
}		}

; CHECK-LABEL: Sturh_zero_4		; CHECK-LABEL: Sturh_zero_4
; CHECK: stp wzr, wzr		; CHECK: stur xzr
; CHECK-STRICT-LABEL: Sturh_zero_4		; CHECK-STRICT-LABEL: Sturh_zero_4
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
define void @Sturh_zero_4(i16* nocapture %P, i32 %n) {		define void @Sturh_zero_4(i16* nocapture %P, i32 %n) {
entry:		entry:
%sub = add nsw i32 %n, -3		%sub = add nsw i32 %n, -3
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll

	Show All 26 Lines
	; CHECK: add [[GR_TOPTMP:x[0-9]+]], sp, #[[GR_BASE]]			; CHECK: add [[GR_TOPTMP:x[0-9]+]], sp, #[[GR_BASE]]
	; CHECK: add [[GR_TOP:x[0-9]+]], [[GR_TOPTMP]], #56			; CHECK: add [[GR_TOP:x[0-9]+]], [[GR_TOPTMP]], #56
	; CHECK: str [[GR_TOP]], [x[[VA_LIST]], #8]			; CHECK: str [[GR_TOP]], [x[[VA_LIST]], #8]

	; CHECK: mov [[VR_TOPTMP:x[0-9]+]], sp			; CHECK: mov [[VR_TOPTMP:x[0-9]+]], sp
	; CHECK: add [[VR_TOP:x[0-9]+]], [[VR_TOPTMP]], #128			; CHECK: add [[VR_TOP:x[0-9]+]], [[VR_TOPTMP]], #128
	; CHECK: str [[VR_TOP]], [x[[VA_LIST]], #16]			; CHECK: str [[VR_TOP]], [x[[VA_LIST]], #16]

	; CHECK: mov [[GR_OFFS:w[0-9]+]], #-56			; CHECK: mov [[GRVR:x[0-9]+]], #-545460846720
	; CHECK: str [[GR_OFFS]], [x[[VA_LIST]], #24]			; CHECK: movk [[GRVR]], #65480
				; CHECK: str [[GRVR]], [x[[VA_LIST]], #24]
	; CHECK: orr [[VR_OFFS:w[0-9]+]], wzr, #0xffffff80
	; CHECK: str [[VR_OFFS]], [x[[VA_LIST]], #28]

	%addr = bitcast %va_list* @var to i8*			%addr = bitcast %va_list* @var to i8*
	call void @llvm.va_start(i8* %addr)			call void @llvm.va_start(i8* %addr)

	ret void			ret void
	}			}

	define void @test_fewargs(i32 %n, i32 %n1, i32 %n2, float %m, ...) {			define void @test_fewargs(i32 %n, i32 %n1, i32 %n2, float %m, ...) {
	Show All 17 Lines
	; CHECK: add [[GR_TOPTMP:x[0-9]+]], sp, #[[GR_BASE]]			; CHECK: add [[GR_TOPTMP:x[0-9]+]], sp, #[[GR_BASE]]
	; CHECK: add [[GR_TOP:x[0-9]+]], [[GR_TOPTMP]], #40			; CHECK: add [[GR_TOP:x[0-9]+]], [[GR_TOPTMP]], #40
	; CHECK: str [[GR_TOP]], [x[[VA_LIST]], #8]			; CHECK: str [[GR_TOP]], [x[[VA_LIST]], #8]

	; CHECK: mov [[VR_TOPTMP:x[0-9]+]], sp			; CHECK: mov [[VR_TOPTMP:x[0-9]+]], sp
	; CHECK: add [[VR_TOP:x[0-9]+]], [[VR_TOPTMP]], #112			; CHECK: add [[VR_TOP:x[0-9]+]], [[VR_TOPTMP]], #112
	; CHECK: str [[VR_TOP]], [x[[VA_LIST]], #16]			; CHECK: str [[VR_TOP]], [x[[VA_LIST]], #16]

	; CHECK: mov [[GR_OFFS:w[0-9]+]], #-40			; CHECK: mov [[GRVR_OFFS:x[0-9]+]], #-40
	; CHECK: str [[GR_OFFS]], [x[[VA_LIST]], #24]			; CHECK: movk [[GRVR_OFFS]], #65424, lsl #32
				; CHECK: str [[GRVR_OFFS]], [x[[VA_LIST]], #24]
	; CHECK: mov [[VR_OFFS:w[0-9]+]], #-11
	; CHECK: str [[VR_OFFS]], [x[[VA_LIST]], #28]

	%addr = bitcast %va_list* @var to i8*			%addr = bitcast %va_list* @var to i8*
	call void @llvm.va_start(i8* %addr)			call void @llvm.va_start(i8* %addr)

	ret void			ret void
	}			}

	define void @test_nospare([8 x i64], [8 x float], ...) {			define void @test_nospare([8 x i64], [8 x float], ...) {
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/tailcall-explicit-sret.ll

Show All 29 Lines
; CHECK: ret		; CHECK: ret
define void @test_tailcall_explicit_sret_alloca_unused() #0 {		define void @test_tailcall_explicit_sret_alloca_unused() #0 {
%l = alloca i1024, align 8		%l = alloca i1024, align 8
tail call void @test_explicit_sret(i1024* %l)		tail call void @test_explicit_sret(i1024* %l)
ret void		ret void
}		}

; CHECK-LABEL: _test_tailcall_explicit_sret_alloca_dummyusers:		; CHECK-LABEL: _test_tailcall_explicit_sret_alloca_dummyusers:
; CHECK: ldr [[PTRLOAD1:x[0-9]+]], [x0]		; CHECK: ldr [[PTRLOAD1:q[0-9]+]], [x0]
; CHECK: str [[PTRLOAD1]], [sp]		; CHECK: str [[PTRLOAD1]], [sp]
; CHECK: mov x8, sp		; CHECK: mov x8, sp
; CHECK-NEXT: bl _test_explicit_sret		; CHECK-NEXT: bl _test_explicit_sret
; CHECK: ret		; CHECK: ret
define void @test_tailcall_explicit_sret_alloca_dummyusers(i1024* %ptr) #0 {		define void @test_tailcall_explicit_sret_alloca_dummyusers(i1024* %ptr) #0 {
%l = alloca i1024, align 8		%l = alloca i1024, align 8
%r = load i1024, i1024* %ptr, align 8		%r = load i1024, i1024* %ptr, align 8
store i1024 %r, i1024* %l, align 8		store i1024 %r, i1024* %l, align 8
Show All 12 Lines	define void @test_tailcall_explicit_sret_gep(i1024* %ptr) #0 {
tail call void @test_explicit_sret(i1024* %ptr2)		tail call void @test_explicit_sret(i1024* %ptr2)
ret void		ret void
}		}

; CHECK-LABEL: _test_tailcall_explicit_sret_alloca_returned:		; CHECK-LABEL: _test_tailcall_explicit_sret_alloca_returned:
; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8		; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8
; CHECK: mov x8, sp		; CHECK: mov x8, sp
; CHECK-NEXT: bl _test_explicit_sret		; CHECK-NEXT: bl _test_explicit_sret
; CHECK-NEXT: ldr [[CALLERSRET1:x[0-9]+]], [sp]		; CHECK-NEXT: ldr [[CALLERSRET1:q[0-9]+]], [sp]
; CHECK: str [[CALLERSRET1:x[0-9]+]], [x[[CALLERX8NUM]]]		; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
; CHECK: ret		; CHECK: ret
define i1024 @test_tailcall_explicit_sret_alloca_returned() #0 {		define i1024 @test_tailcall_explicit_sret_alloca_returned() #0 {
%l = alloca i1024, align 8		%l = alloca i1024, align 8
tail call void @test_explicit_sret(i1024* %l)		tail call void @test_explicit_sret(i1024* %l)
%r = load i1024, i1024* %l, align 8		%r = load i1024, i1024* %l, align 8
ret i1024 %r		ret i1024 %r
}		}

; CHECK-LABEL: _test_indirect_tailcall_explicit_sret_nosret_arg:		; CHECK-LABEL: _test_indirect_tailcall_explicit_sret_nosret_arg:
; CHECK-DAG: mov x[[CALLERX8NUM:[0-9]+]], x8		; CHECK-DAG: mov x[[CALLERX8NUM:[0-9]+]], x8
; CHECK-DAG: mov [[FPTR:x[0-9]+]], x0		; CHECK-DAG: mov [[FPTR:x[0-9]+]], x0
; CHECK: mov x0, sp		; CHECK: mov x0, sp
; CHECK-NEXT: blr [[FPTR]]		; CHECK-NEXT: blr [[FPTR]]
; CHECK-NEXT: ldr [[CALLERSRET1:x[0-9]+]], [sp]		; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]
; CHECK: str [[CALLERSRET1:x[0-9]+]], [x[[CALLERX8NUM]]]		; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
; CHECK: ret		; CHECK: ret
define void @test_indirect_tailcall_explicit_sret_nosret_arg(i1024* sret %arg, void (i1024) %f) #0 {		define void @test_indirect_tailcall_explicit_sret_nosret_arg(i1024* sret %arg, void (i1024) %f) #0 {
%l = alloca i1024, align 8		%l = alloca i1024, align 8
tail call void %f(i1024* %l)		tail call void %f(i1024* %l)
%r = load i1024, i1024* %l, align 8		%r = load i1024, i1024* %l, align 8
store i1024 %r, i1024* %arg, align 8		store i1024 %r, i1024* %arg, align 8
ret void		ret void
}		}

; CHECK-LABEL: _test_indirect_tailcall_explicit_sret_:		; CHECK-LABEL: _test_indirect_tailcall_explicit_sret_:
; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8		; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8
; CHECK: mov x8, sp		; CHECK: mov x8, sp
; CHECK-NEXT: blr x0		; CHECK-NEXT: blr x0
; CHECK-NEXT: ldr [[CALLERSRET1:x[0-9]+]], [sp]		; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]
; CHECK: str [[CALLERSRET1:x[0-9]+]], [x[[CALLERX8NUM]]]		; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
; CHECK: ret		; CHECK: ret
define void @test_indirect_tailcall_explicit_sret_(i1024* sret %arg, i1024 ()* %f) #0 {		define void @test_indirect_tailcall_explicit_sret_(i1024* sret %arg, i1024 ()* %f) #0 {
%ret = tail call i1024 %f()		%ret = tail call i1024 %f()
store i1024 %ret, i1024* %arg, align 8		store i1024 %ret, i1024* %arg, align 8
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AArch64/tailcall-implicit-sret.ll

	; RUN: llc < %s -mtriple arm64-apple-darwin -aarch64-enable-ldst-opt=false -disable-post-ra -asm-verbose=false \| FileCheck %s			; RUN: llc < %s -mtriple arm64-apple-darwin -aarch64-enable-ldst-opt=false -disable-post-ra -asm-verbose=false \| FileCheck %s
	; Disable the load/store optimizer to avoid having LDP/STPs and simplify checks.			; Disable the load/store optimizer to avoid having LDP/STPs and simplify checks.

	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	; Check that we don't try to tail-call with an sret-demoted return.			; Check that we don't try to tail-call with an sret-demoted return.

	declare i1024 @test_sret() #0			declare i1024 @test_sret() #0

	; CHECK-LABEL: _test_call_sret:			; CHECK-LABEL: _test_call_sret:
	; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8			; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8
	; CHECK: mov x8, sp			; CHECK: mov x8, sp
	; CHECK-NEXT: bl _test_sret			; CHECK-NEXT: bl _test_sret
	; CHECK-NEXT: ldr [[CALLERSRET1:x[0-9]+]], [sp]			; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]
	; CHECK: str [[CALLERSRET1:x[0-9]+]], [x[[CALLERX8NUM]]]			; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
	; CHECK: ret			; CHECK: ret
	define i1024 @test_call_sret() #0 {			define i1024 @test_call_sret() #0 {
	%a = call i1024 @test_sret()			%a = call i1024 @test_sret()
	ret i1024 %a			ret i1024 %a
	}			}

	; CHECK-LABEL: _test_tailcall_sret:			; CHECK-LABEL: _test_tailcall_sret:
	; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8			; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8
	; CHECK: mov x8, sp			; CHECK: mov x8, sp
	; CHECK-NEXT: bl _test_sret			; CHECK-NEXT: bl _test_sret
	; CHECK-NEXT: ldr [[CALLERSRET1:x[0-9]+]], [sp]			; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]
	; CHECK: str [[CALLERSRET1:x[0-9]+]], [x[[CALLERX8NUM]]]			; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
	; CHECK: ret			; CHECK: ret
	define i1024 @test_tailcall_sret() #0 {			define i1024 @test_tailcall_sret() #0 {
	%a = tail call i1024 @test_sret()			%a = tail call i1024 @test_sret()
	ret i1024 %a			ret i1024 %a
	}			}

	; CHECK-LABEL: _test_indirect_tailcall_sret:			; CHECK-LABEL: _test_indirect_tailcall_sret:
	; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8			; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8
	; CHECK: mov x8, sp			; CHECK: mov x8, sp
	; CHECK-NEXT: blr x0			; CHECK-NEXT: blr x0
	; CHECK-NEXT: ldr [[CALLERSRET1:x[0-9]+]], [sp]			; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]
	; CHECK: str [[CALLERSRET1:x[0-9]+]], [x[[CALLERX8NUM]]]			; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
	; CHECK: ret			; CHECK: ret
	define i1024 @test_indirect_tailcall_sret(i1024 ()* %f) #0 {			define i1024 @test_indirect_tailcall_sret(i1024 ()* %f) #0 {
	%a = tail call i1024 %f()			%a = tail call i1024 %f()
	ret i1024 %a			ret i1024 %a
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	entry:
store i32 %5, i32 addrspace(1)* %out		store i32 %5, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}char_array:		; FUNC-LABEL: {{^}}char_array:

; R600: MOVA_INT		; R600: MOVA_INT

; SI-PROMOTE-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding:		; SI-PROMOTE-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding:
; SI-PROMOTE-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:5 ; encoding:

; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding: [0x04,0x00,0x60,0xe0		; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding: [0x04,0x00,0x60,0xe0
; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:5 ; encoding: [0x05,0x00,0x60,0xe0		; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:5 ; encoding: [0x05,0x00,0x60,0xe0
define amdgpu_kernel void @char_array(i32 addrspace(1)* %out, i32 %index) #0 {		define amdgpu_kernel void @char_array(i32 addrspace(1)* %out, i32 %index) #0 {
entry:		entry:
%0 = alloca [2 x i8]		%0 = alloca [2 x i8]
%1 = getelementptr inbounds [2 x i8], [2 x i8]* %0, i32 0, i32 0		%1 = getelementptr inbounds [2 x i8], [2 x i8]* %0, i32 0, i32 0
%2 = getelementptr inbounds [2 x i8], [2 x i8]* %0, i32 0, i32 1		%2 = getelementptr inbounds [2 x i8], [2 x i8]* %0, i32 0, i32 1
▲ Show 20 Lines • Show All 300 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/fp16-promote.ll

	Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines
	}			}

	; f16 vectors are not legal in the backend. Vector elements are not assigned			; f16 vectors are not legal in the backend. Vector elements are not assigned
	; to the register, but are stored in the stack instead. Hence insertelement			; to the register, but are stored in the stack instead. Hence insertelement
	; and extractelement have these extra loads and stores.			; and extractelement have these extra loads and stores.

	; CHECK-ALL-LABEL: test_insertelement:			; CHECK-ALL-LABEL: test_insertelement:
	; CHECK-ALL: sub sp, sp, #8			; CHECK-ALL: sub sp, sp, #8
	; CHECK-ALL: ldrh
	; CHECK-ALL: ldrh			; CHECK-VFP: and
	; CHECK-ALL: ldrh			; CHECK-VFP: mov
	; CHECK-ALL: ldrh			; CHECK-VFP: ldrd
	; CHECK-ALL-DAG: strh			; CHECK-VFP: orr
	; CHECK-ALL-DAG: strh			; CHECK-VFP: ldrh
	; CHECK-ALL-DAG: mov			; CHECK-VFP: stm
	; CHECK-ALL-DAG: ldrh			; CHECK-VFP: strh
	; CHECK-ALL-DAG: orr			; CHECK-VFP: ldm
	; CHECK-ALL-DAG: strh			; CHECK-VFP: stm
	; CHECK-ALL-DAG: strh
	; CHECK-ALL-DAG: strh			; CHECK-NOVFP: ldrh
	; CHECK-ALL-DAG: ldrh			; CHECK-NOVFP: ldrh
	; CHECK-ALL-DAG: ldrh			; CHECK-NOVFP: ldrh
	; CHECK-ALL-DAG: ldrh			; CHECK-NOVFP: ldrh
	; CHECK-ALL-DAG: strh			; CHECK-NOVFP-DAG: strh
	; CHECK-ALL-DAG: strh			; CHECK-NOVFP-DAG: strh
	; CHECK-ALL-DAG: strh			; CHECK-NOVFP-DAG: mov
	; CHECK-ALL-DAG: strh			; CHECK-NOVFP-DAG: ldrh
				; CHECK-NOVFP-DAG: orr
				; CHECK-NOVFP-DAG: strh
				; CHECK-NOVFP-DAG: strh
				; CHECK-NOVFP-DAG: strh
				; CHECK-NOVFP-DAG: ldrh
				; CHECK-NOVFP-DAG: ldrh
				; CHECK-NOVFP-DAG: ldrh
				; CHECK-NOVFP-DAG: strh
				; CHECK-NOVFP-DAG: strh
				; CHECK-NOVFP-DAG: strh
				; CHECK-NOVFP-DAG: strh

	; CHECK-ALL: add sp, sp, #8			; CHECK-ALL: add sp, sp, #8
	define void @test_insertelement(half* %p, <4 x half>* %q, i32 %i) #0 {			define void @test_insertelement(half* %p, <4 x half>* %q, i32 %i) #0 {
	%a = load half, half* %p, align 2			%a = load half, half* %p, align 2
	%b = load <4 x half>, <4 x half>* %q, align 8			%b = load <4 x half>, <4 x half>* %q, align 8
	%c = insertelement <4 x half> %b, half %a, i32 %i			%c = insertelement <4 x half> %b, half %a, i32 %i
	store <4 x half> %c, <4 x half>* %q			store <4 x half> %c, <4 x half>* %q
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/BPF/undef.ll

	; RUN: llc < %s -march=bpfel \| FileCheck -check-prefixes=CHECK,EL %s			; RUN: llc < %s -march=bpfel \| FileCheck -check-prefixes=CHECK,EL %s
	; RUN: llc < %s -march=bpfeb \| FileCheck -check-prefixes=CHECK,EB %s			; RUN: llc < %s -march=bpfeb \| FileCheck -check-prefixes=CHECK,EB %s

	%struct.bpf_map_def = type { i32, i32, i32, i32 }			%struct.bpf_map_def = type { i32, i32, i32, i32 }
	%struct.__sk_buff = type opaque			%struct.__sk_buff = type opaque
	%struct.routing_key_2 = type { [6 x i8] }			%struct.routing_key_2 = type { [6 x i8] }

	@routing = global %struct.bpf_map_def { i32 1, i32 6, i32 12, i32 1024 }, section "maps", align 4			@routing = global %struct.bpf_map_def { i32 1, i32 6, i32 12, i32 1024 }, section "maps", align 4
	@routing_miss_0 = global %struct.bpf_map_def { i32 1, i32 1, i32 12, i32 1 }, section "maps", align 4			@routing_miss_0 = global %struct.bpf_map_def { i32 1, i32 1, i32 12, i32 1 }, section "maps", align 4
	@test1 = global %struct.bpf_map_def { i32 2, i32 4, i32 8, i32 1024 }, section "maps", align 4			@test1 = global %struct.bpf_map_def { i32 2, i32 4, i32 8, i32 1024 }, section "maps", align 4
	@test1_miss_4 = global %struct.bpf_map_def { i32 2, i32 1, i32 8, i32 1 }, section "maps", align 4			@test1_miss_4 = global %struct.bpf_map_def { i32 2, i32 1, i32 8, i32 1 }, section "maps", align 4
	@_license = global [4 x i8] c"GPL\00", section "license", align 1			@_license = global [4 x i8] c"GPL\00", section "license", align 1
	@llvm.used = appending global [6 x i8] [i8 getelementptr inbounds ([4 x i8], [4 x i8]* @_license, i32 0, i32 0), i8* bitcast (i32 (%struct.__sk_buff) @ebpf_filter to i8), i8 bitcast (%struct.bpf_map_def* @routing to i8), i8 bitcast (%struct.bpf_map_def* @routing_miss_0 to i8), i8 bitcast (%struct.bpf_map_def* @test1 to i8), i8 bitcast (%struct.bpf_map_def* @test1_miss_4 to i8*)], section "llvm.metadata"			@llvm.used = appending global [6 x i8] [i8 getelementptr inbounds ([4 x i8], [4 x i8]* @_license, i32 0, i32 0), i8* bitcast (i32 (%struct.__sk_buff) @ebpf_filter to i8), i8 bitcast (%struct.bpf_map_def* @routing to i8), i8 bitcast (%struct.bpf_map_def* @routing_miss_0 to i8), i8 bitcast (%struct.bpf_map_def* @test1 to i8), i8 bitcast (%struct.bpf_map_def* @test1_miss_4 to i8*)], section "llvm.metadata"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define i32 @ebpf_filter(%struct.__sk_buff* nocapture readnone %ebpf_packet) #0 section "socket1" {			define i32 @ebpf_filter(%struct.__sk_buff* nocapture readnone %ebpf_packet) #0 section "socket1" {
	; EL: r1 = 134678021
	; EB: r1 = 84281096			; EL: r1 = 11033905661445 ll
	; CHECK: (u32 )(r10 - 8) = r1			; EB: r1 = 361984551142686720 ll
	; EL: r1 = 2569			; CHECK: (u64 )(r10 - 8) = r1
	; EB: r1 = 2314
	; CHECK: (u16 )(r10 - 4) = r1

	; CHECK: r1 = 0			; CHECK: r1 = 0
	; CHECK: (u16 )(r10 + 24) = r1			; CHECK: (u16 )(r10 + 24) = r1
	; CHECK: (u16 )(r10 + 22) = r1			; CHECK: (u16 )(r10 + 22) = r1
	; CHECK: (u16 )(r10 + 20) = r1			; CHECK: (u16 )(r10 + 20) = r1
	; CHECK: (u16 )(r10 + 18) = r1			; CHECK: (u16 )(r10 + 18) = r1
	; CHECK: (u16 )(r10 + 16) = r1			; CHECK: (u16 )(r10 + 16) = r1
	; CHECK: (u16 )(r10 + 14) = r1			; CHECK: (u16 )(r10 + 14) = r1
	; CHECK: (u16 )(r10 + 12) = r1			; CHECK: (u16 )(r10 + 12) = r1
	; CHECK: (u16 )(r10 + 10) = r1			; CHECK: (u16 )(r10 + 10) = r1
	; CHECK: (u16 )(r10 + 8) = r1			; CHECK: (u16 )(r10 + 8) = r1
	; CHECK: (u16 )(r10 + 6) = r1			; CHECK: (u16 )(r10 + 6) = r1
	; CHECK: (u16 )(r10 + 4) = r1			; CHECK: (u16 )(r10 + 4) = r1
	; CHECK: (u16 )(r10 + 2) = r1			; CHECK: (u16 )(r10 + 2) = r1
	; CHECK: (u16 )(r10 + 0) = r1			; CHECK: (u16 )(r10 + 0) = r1
	; CHECK: (u16 )(r10 - 2) = r1
	; CHECK: (u16 )(r10 + 26) = r1			; CHECK: (u16 )(r10 + 26) = r1

	; CHECK: r2 = r10			; CHECK: r2 = r10
	; CHECK: r2 += -8			; CHECK: r2 += -8
	; CHECK: r1 = routing			; CHECK: r1 = routing
	; CHECK: call bpf_map_lookup_elem			; CHECK: call bpf_map_lookup_elem
	; CHECK: exit			; CHECK: exit
	%key = alloca %struct.routing_key_2, align 1			%key = alloca %struct.routing_key_2, align 1
	Show All 22 Lines

llvm/trunk/test/CodeGen/Mips/cconv/vector.ll

	Show First 20 Lines • Show All 815 Lines • ▼ Show 20 Lines
	; MIPS32-NOT: addiu $7			; MIPS32-NOT: addiu $7

	; MIPS32R5-DAG: lhu $4, {{[0-9]+}}($sp)			; MIPS32R5-DAG: lhu $4, {{[0-9]+}}($sp)
	; MIPS32R5-DAG: lhu $5, {{[0-9]+}}($sp)			; MIPS32R5-DAG: lhu $5, {{[0-9]+}}($sp)

	; MIPS32R5: jal			; MIPS32R5: jal
	; MIPS32R5: sw $2, {{[0-9]+}}($sp)			; MIPS32R5: sw $2, {{[0-9]+}}($sp)

	; MIPS32R5-DAG: sb ${{[0-9]+}}, 1(${{[0-9]+}})			; MIPS32R5-DAG; sh ${{[0-9]+}}, %lo(gv2i8)(${{[0-9]+}})
	; MIPS32R5-DAG; sb ${{[0-9]+}}, %lo(gv2i8)(${{[0-9]+}})
				; MIPS32R5-NOT: sb ${{[0-9]+}}, 1(${{[0-9]+}})
				; MIPS32R5-NOT; sb ${{[0-9]+}}, %lo(gv2i8)(${{[0-9]+}})

	; MIPS64EB: daddiu $4, $zero, 1543			; MIPS64EB: daddiu $4, $zero, 1543
	; MIPS64EB: daddiu $5, $zero, 3080			; MIPS64EB: daddiu $5, $zero, 3080

	; MIPS64EL: daddiu $4, $zero, 1798			; MIPS64EL: daddiu $4, $zero, 1798
	; MIPS64EL; daddiu $5, $zero, 2060			; MIPS64EL; daddiu $5, $zero, 2060

	; MIPS64R5-DAG: lh $4			; MIPS64R5-DAG: lh $4
	Show All 31 Lines
	define void @call_i8_4() {			define void @call_i8_4() {
	entry:			entry:
	; ALL-LABEL: call_i8_4:			; ALL-LABEL: call_i8_4:
	; MIPS32: ori $4			; MIPS32: ori $4
	; MIPS32: ori $5			; MIPS32: ori $5
	; MIPS32-NOT: ori $6			; MIPS32-NOT: ori $6
	; MIPS32-NOT: ori $7			; MIPS32-NOT: ori $7

	; MIPS32R5-DAG: lw $4, {{[0-9]+}}($sp)			; MIPS32R5-NOT: lw $4, {{[0-9]+}}($sp)
	; MIPS32R5-DAG: lw $5, {{[0-9]+}}($sp)			; MIPS32R5-NOT: lw $5, {{[0-9]+}}($sp)

	; MIPS64: ori $4			; MIPS64: ori $4
	; MIPS64: ori $5			; MIPS64: ori $5

	; MIPS64R5: lw $4			; MIPS64R5-NOT: lw $4
	; MIPS64R5: lw $5			; MIPS64R5-NOT: lw $5

	; MIPS32: jal i8_4			; MIPS32: jal i8_4
	; MIPS64: jalr $25			; MIPS64: jalr $25

	; MIPS32: sw $2			; MIPS32: sw $2

	; MIPS32R5-DAG: sw $2			; MIPS32R5-DAG: sw $2

	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

	define void @calli16_2() {			define void @calli16_2() {
	entry:			entry:
	; ALL-LABEL: calli16_2:			; ALL-LABEL: calli16_2:

	; MIPS32-DAG: ori $4			; MIPS32-DAG: ori $4
	; MIPS32-DAG: ori $5			; MIPS32-DAG: ori $5

	; MIPS32R5-DAG: lw $4			; MIPS32R5-NOT: lw $4
	; MIPS32R5-DAG: lw $5			; MIPS32R5-NOT: lw $5

	; MIPS64: ori $4			; MIPS64: ori $4
	; MIPS64: ori $5			; MIPS64: ori $5

	; MIPS64R5-DAG: lw $4			; MIPS64R5-NOT: lw $4
	; MIPS64R5-DAG: lw $5			; MIPS64R5-NOT: lw $5

	; MIPS32: jal i16_2			; MIPS32: jal i16_2
	; MIPS64: jalr $25			; MIPS64: jalr $25

	; MIPS32: sw $2, %lo(gv2i16)			; MIPS32: sw $2, %lo(gv2i16)

	; MIPS32R5: sw $2, %lo(gv2i16)			; MIPS32R5: sw $2, %lo(gv2i16)

	Show All 17 Lines
	; MIPS32R5-DAG: ori $4			; MIPS32R5-DAG: ori $4
	; MIPS32R5-DAG: ori $5			; MIPS32R5-DAG: ori $5
	; MIPS32R5-DAG: ori $6			; MIPS32R5-DAG: ori $6
	; MIPS32R5-DAG: move $7			; MIPS32R5-DAG: move $7

	; MIPS64-DAG: daddiu $4			; MIPS64-DAG: daddiu $4
	; MIPS64-DAG: daddiu $5			; MIPS64-DAG: daddiu $5

	; MIPS64R5-DAG: ld $4			; MIPS64R5-NOT: ld $4
	; MIPS64R5-DAG: ld $5			; MIPS64R5-NOT: ld $5

	; MIPS32: jal i16_4			; MIPS32: jal i16_4
	; MIPS64: jalr $25			; MIPS64: jalr $25

	; MIPS32-DAG: sw $3, 4(${{[0-9]+}})			; MIPS32-DAG: sw $3, 4(${{[0-9]+}})
	; MIPS32-DAG: sw $2, %lo(gv4i16)(${{[0-9]+}})			; MIPS32-DAG: sw $2, %lo(gv4i16)(${{[0-9]+}})

	; MIPS32R5-DAG: sw $3, 4(${{[0-9]+}})			; MIPS32R5-DAG: sw $3, 4(${{[0-9]+}})
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; MIPS32R5-DAG: addiu $4			; MIPS32R5-DAG: addiu $4
	; MIPS32R5-DAG: addiu $5			; MIPS32R5-DAG: addiu $5
	; MIPS32R5-DAG: addiu $6			; MIPS32R5-DAG: addiu $6
	; MIPS32R5-DAG: addiu $7			; MIPS32R5-DAG: addiu $7

	; MIPS64: daddiu $4			; MIPS64: daddiu $4
	; MIPS64: daddiu $5			; MIPS64: daddiu $5

	; MIPS64R5-DAG: ld $4			; MIPS64R5-NOT ld $4
	; MIPS64R5-DAG: ld $5			; MIPS64R5-NOT: ld $5

	; MIPS32: jal i32_2			; MIPS32: jal i32_2
	; MIPS64: jalr $25			; MIPS64: jalr $25

	; MIPS32-DAG: sw $2, %lo(gv2i32)(${{[0-9]+}})			; MIPS32-DAG: sw $2, %lo(gv2i32)(${{[0-9]+}})
	; MIPS32-DAG: sw $3, 4(${{[0-9]+}})			; MIPS32-DAG: sw $3, 4(${{[0-9]+}})

	; MIPS32R5-DAG: sw $2, %lo(gv2i32)(${{[0-9]+}})			; MIPS32R5-DAG: sw $2, %lo(gv2i32)(${{[0-9]+}})
	▲ Show 20 Lines • Show All 512 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Mips/llvm-ir/extractelement.ll

	; RUN: llc < %s -march=mips -mcpu=mips2 \| FileCheck %s -check-prefix=ALL			; RUN: llc < %s -march=mips -mcpu=mips2 \| FileCheck %s -check-prefix=ALL

	; This test triggered a bug in the vector splitting where the type legalizer			; This test triggered a bug in the vector splitting where the type legalizer
	; attempted to extract the element with by storing the vector, then reading			; attempted to extract the element with by storing the vector, then reading
	; an element back. However, the address calculation was:			; an element back. However, the address calculation was:
	; Base + Index * (EltSizeInBits / 8)			; Base + Index * (EltSizeInBits / 8)
	; and EltSizeInBits was 1. This caused the index to be forgotten.			; and EltSizeInBits was 1. This caused the index to be forgotten.
	define i1 @via_stack_bug(i8 signext %idx) {			define i1 @via_stack_bug(i8 signext %idx) {
	%1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx			%1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx
	ret i1 %1			ret i1 %1
	}			}

	; ALL-LABEL: via_stack_bug:			; ALL-LABEL: via_stack_bug:
	; ALL-DAG: addiu [[ONE:\$[0-9]+]], $zero, 1			; ALL-DAG: addiu [[ONE:\$[0-9]+]], $zero, 1
	; ALL-DAG: sb [[ONE]], 7($sp)			; ALL-DAG: sh [[ONE]], 6($sp)
	; ALL-DAG: sb $zero, 6($sp)
	; ALL-DAG: andi [[MASKED_IDX:\$[0-9]+]], $4, 1			; ALL-DAG: andi [[MASKED_IDX:\$[0-9]+]], $4, 1
	; ALL-DAG: addiu [[VPTR:\$[0-9]+]], $sp, 6			; ALL-DAG: addiu [[VPTR:\$[0-9]+]], $sp, 6
	; ALL-DAG: or [[EPTR:\$[0-9]+]], [[MASKED_IDX]], [[VPTR]]			; ALL-DAG: or [[EPTR:\$[0-9]+]], [[MASKED_IDX]], [[VPTR]]
	; ALL: lbu $2, 0([[EPTR]])			; ALL: lbu $2, 0([[EPTR]])

llvm/trunk/test/CodeGen/SystemZ/fp-move-13.ll

Show All 16 Lines	; CHECK: br %r14
store volatile fp128 %val, fp128 *%x		store volatile fp128 %val, fp128 *%x
ret void		ret void
}		}

; Test 128-bit moves from GPRs to VRs. i128 isn't a legitimate type,		; Test 128-bit moves from GPRs to VRs. i128 isn't a legitimate type,
; so this goes through memory.		; so this goes through memory.
define void @f2(fp128 %a, i128 %b) {		define void @f2(fp128 %a, i128 %b) {
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: lg		; CHECK: vl
; CHECK: lg		; CHECK: vst
; CHECK: stg
; CHECK: stg
; CHECK: br %r14		; CHECK: br %r14
%val = load i128 , i128 *%b		%val = load i128 , i128 *%b
%res = bitcast i128 %val to fp128		%res = bitcast i128 %val to fp128
store fp128 %res, fp128 *%a		store fp128 %res, fp128 *%a
ret void		ret void
}		}

; Test 128-bit moves from VRs to GPRs, with the same restriction as f2.		; Test 128-bit moves from VRs to GPRs, with the same restriction as f2.
Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Do MergeConsecutiveStores again before Instruction SelectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 124379

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/test/CodeGen/AArch64/arm64-complex-ret.ll

llvm/trunk/test/CodeGen/AArch64/arm64-narrow-st-merge.ll

llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll

llvm/trunk/test/CodeGen/AArch64/tailcall-explicit-sret.ll

llvm/trunk/test/CodeGen/AArch64/tailcall-implicit-sret.ll

llvm/trunk/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

llvm/trunk/test/CodeGen/ARM/fp16-promote.ll

llvm/trunk/test/CodeGen/BPF/undef.ll

llvm/trunk/test/CodeGen/Mips/cconv/vector.ll

llvm/trunk/test/CodeGen/Mips/llvm-ir/extractelement.ll

llvm/trunk/test/CodeGen/SystemZ/fp-move-13.ll

[DAG] Do MergeConsecutiveStores again before Instruction Selection
ClosedPublic