This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1456	s/else if/else/ And move the condition of the if in an assert.
llvm/test/CodeGen/AArch64/ldst-pairing.ll
4	CHECK-LABEL: + Give a description of what is tested.
21	ditto
49	ditto

gberry added a subscriber: mcrosier.Feb 10 2016, 2:31 PM

Address comments from Quentin.

qcolombet added inline comments.Feb 10 2016, 3:44 PM

llvm/test/CodeGen/AArch64/ldst-pairing.ll
5	Could you be more specific? In particular, explain what is the difference between @st1, @st2, and so on. Put differently what is relevant in this test that was not previously tested. For instance, the FIXME in st5 is a good example. We check the merging of stores when the access are not in increasing order of the address location.

flyingforyou added a subscriber: flyingforyou.Feb 10 2016, 3:51 PM

Please add the non-temporal ldp/stp instructions. This should probably land after D17097.

Add nontemporal, update testcase comments.

mcrosier mentioned this in D17097: [AArch64] add mem ref to unscaled pairs.Feb 11 2016, 6:27 AM

All of the provided tests pass on current top of trunk (i.e., without this patch). We'll need some tests that exercise the patch before it can be committed.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1346	I think the 'or' is misleading as we're looking for a base register followed by an immediate offset in both the non-paired and paired cases. We're looking for 'ldr x1, [x0, #imm]' or 'ldp x1, x2, [x0, #imm]' Therefore, I'd suggest leaving this comment as is and add comments below.
1348	// Non-paired instruction (e.g., ldr x1, [x0, #8]).
1351	// Paired instruction (e.g., ldp x1, x2, [x0, #8]).

This revision now requires changes to proceed.Feb 17 2016, 7:20 AM

mcrosier added inline comments.Feb 17 2016, 7:30 AM

llvm/test/CodeGen/AArch64/ldst-pairing.ll
1	You may need to disable the MI scheduler for this to work. I believe you can do this by specifying -disable-post-ra on the llc command line.

mcrosier added a reviewer: t.p.northover.Feb 17 2016, 7:54 AM

evandro added a subscriber: evandro.Feb 18 2016, 1:41 PM

LGTM, assuming there are no correctness issues.

Please only commit the st2 test as the other tests are either not related to the patch our are redundant with st2. Also, the test can be included in arm64-stp-aa.ll, rather than creating a new file.

llvm/test/CodeGen/AArch64/ldst-pairing.ll
8	This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e., this passes without this patch).
30	This is the most relevant test to the commit. To properly test the patch the pre-RA MI scheduler must be disabled (-enable-misched=false) and D17097 must also be applied.
62	Same story at st1. This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e., this passes without this patch).
81	This is redundant and not needed.
103	This test is not relevant to this patch. It can be committed separately with a FIXME comment in AArch64LoadStoreOptimizer.cpp.

This revision is now accepted and ready to land.Mar 7 2016, 10:26 AM

I wanted to push this patch along, so I ran some correctness tests. It turns out the paired instructions aren't being scaled properly (see inline comments). After addressing those issues I'm now seeing a few regressions due to fewer stp instructions. I'm going to block this patch while I investigate.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1405	Scale = 16; Width = 32;
1414	Scale = 8; Width = 16;
1426	Scale = 4; Width = 8;

This revision now requires changes to proceed.Mar 8 2016, 7:30 AM

PR26879 may be part of the problem.
https://llvm.org/bugs/show_bug.cgi?id=26879

mcrosier commandeered this revision.Mar 8 2016, 11:57 AM

mcrosier edited reviewers, added: sebpop; removed: mcrosier.

Correct scale/offset issues and address all previous feedback.

In D17098#370005, @mcrosier wrote:

Correct scale/offset issues and address all previous feedback.

The corrected version looks very similar to what I remember having implemented first, and then we found out that the scale did not match.
Let's not apply this patch yet: I need to discuss this again with my colleague Abderrazek and we will try to find why we changed to scale = width for ldp/stp formation. We will come back with a counter-example if we find one.

In D17098#370320, @sebpop wrote:

In D17098#370005, @mcrosier wrote:

Correct scale/offset issues and address all previous feedback.

The corrected version looks very similar to what I remember having implemented first, and then we found out that the scale did not match.
Let's not apply this patch yet: I need to discuss this again with my colleague Abderrazek and we will try to find why we changed to scale = width for ldp/stp formation. We will come back with a counter-example if we find one.

Sounds good. Let me know if I've missed something. The matching scale and offsets caused a large number of correctness failures in the test-suite as well as SPEC200X in my testing.

In D17098#370340, @mcrosier wrote:

Let me know if I've missed something. The matching scale and offsets caused a large number of correctness failures in the test-suite as well as SPEC200X in my testing.

Chad, your changes to the patch are correct: the ARM documentation states that in the case of a pair of 32bit regs the immediate byte offset is a multiple of 4.
The patch LGTM.

hiraditya added a subscriber: hiraditya.Mar 22 2016, 9:38 PM

Ping.
Abderrazek has looked at the code produced with the patch on xalan benchmark
and he found that with the patch there are 13 more stp, and not fewer stp.
Ok to commit?

haicheng added a subscriber: haicheng.Apr 7 2016, 8:33 AM

Chad, is it ok to commit this patch?

Please rebase it. LGTM.

lib/Target/AArch64/AArch64InstrInfo.cpp
1351 ↗	(On Diff #50064)	Please use getNumExplicitOperands().

In D17098#402547, @junbuml wrote:

Please rebase it. LGTM.

Approved per Jun and Sebastian. Jun verified the regression I was seeing no longer exists. I'll go ahead and rebase the patch and commit with Jun's suggestion to use getNumExplicitOperands(). Thanks, Jun/Sebastian.

This revision is now accepted and ready to land.Apr 15 2016, 9:10 AM

Committed in r266462.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

46 lines

test/

CodeGen/

AArch64/

aarch64-a57-fp-load-balancing.ll

20 lines

arm64-alloc-no-stack-realign.ll

2 lines

fastcc.ll

6 lines

ldst-pairing.ll

116 lines

Diff 47602

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 1,336 Lines • ▼ Show 20 Lines	case AArch64::LDRWui:
Offset = LdSt->getOperand(2).getImm() * Width;		Offset = LdSt->getOperand(2).getImm() * Width;
return true;		return true;
};		};
}		}

bool AArch64InstrInfo::getMemOpBaseRegImmOfsWidth(		bool AArch64InstrInfo::getMemOpBaseRegImmOfsWidth(
MachineInstr *LdSt, unsigned &BaseReg, int &Offset, int &Width,		MachineInstr *LdSt, unsigned &BaseReg, int &Offset, int &Width,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
// Handle only loads/stores with base register followed by immediate offset.		// Handle only loads/stores with base register followed by immediate offset
if (LdSt->getNumOperands() != 3)		// or pairs.
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions I think the 'or' is misleading as we're looking for a base register followed by an immediate offset in both the non-paired and paired cases. We're looking for 'ldr x1, [x0, #imm]' or 'ldp x1, x2, [x0, #imm]' Therefore, I'd suggest leaving this comment as is and add comments below. mcrosier: I think the 'or' is misleading as we're looking for a base register followed by an immediate…
return false;		if (LdSt->getNumOperands() == 3) {
if (!LdSt->getOperand(1).isReg() \|\| !LdSt->getOperand(2).isImm())		if (!LdSt->getOperand(1).isReg() \|\| !LdSt->getOperand(2).isImm())
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions // Non-paired instruction (e.g., ldr x1, [x0, #8]). mcrosier: // Non-paired instruction (e.g., ldr x1, [x0, #8]).
return false;		return false;
		} else if (LdSt->getNumOperands() == 4) {
		if (!LdSt->getOperand(1).isReg() \|\| !LdSt->getOperand(2).isReg() \|\| !LdSt->getOperand(3).isImm())
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions // Paired instruction (e.g., ldp x1, x2, [x0, #8]). mcrosier: // Paired instruction (e.g., ldp x1, x2, [x0, #8]).
		return false;
		} else {
		return false;
		}

// Offset is calculated as the immediate operand multiplied by the scaling factor.		// Offset is calculated as the immediate operand multiplied by the scaling factor.
// Unscaled instructions have scaling factor set to 1.		// Unscaled instructions have scaling factor set to 1.
int Scale = 0;		int Scale = 0;
switch (LdSt->getOpcode()) {		switch (LdSt->getOpcode()) {
default:		default:
return false;		return false;
case AArch64::LDURQi:		case AArch64::LDURQi:
Show All 29 Lines	bool AArch64InstrInfo::getMemOpBaseRegImmOfsWidth(
case AArch64::LDURBBi:		case AArch64::LDURBBi:
case AArch64::LDURSBXi:		case AArch64::LDURSBXi:
case AArch64::LDURSBWi:		case AArch64::LDURSBWi:
case AArch64::STURBi:		case AArch64::STURBi:
case AArch64::STURBBi:		case AArch64::STURBBi:
Width = 1;		Width = 1;
Scale = 1;		Scale = 1;
break;		break;
		case AArch64::LDPQi:
		case AArch64::STPQi:
		case AArch64::LDNPQi:
		case AArch64::STNPQi:
		Scale = Width = 32;
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Scale = 16; Width = 32; mcrosier: Scale = 16; Width = 32;
		break;
		case AArch64::LDPXi:
		case AArch64::LDPDi:
		case AArch64::STPXi:
		case AArch64::STPDi:
		case AArch64::LDNPXi:
		case AArch64::LDNPDi:
		case AArch64::STNPXi:
		case AArch64::STNPDi:
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Scale = 8; Width = 16; mcrosier: Scale = 8; Width = 16;
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::STRQui:		case AArch64::STRQui:
Scale = Width = 16;		Scale = Width = 16;
break;		break;
		case AArch64::LDPWi:
		case AArch64::STPWi:
		case AArch64::LDPSi:
		case AArch64::STPSi:
		case AArch64::LDNPWi:
		case AArch64::STNPWi:
		case AArch64::LDNPSi:
		case AArch64::STNPSi:
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Scale = 4; Width = 8; mcrosier: Scale = 4; Width = 8;
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::STRXui:		case AArch64::STRXui:
case AArch64::STRDui:		case AArch64::STRDui:
Scale = Width = 8;		Scale = Width = 8;
break;		break;
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDRSui:		case AArch64::LDRSui:
Show All 10 Lines	bool AArch64InstrInfo::getMemOpBaseRegImmOfsWidth(
case AArch64::LDRBui:		case AArch64::LDRBui:
case AArch64::LDRBBui:		case AArch64::LDRBBui:
case AArch64::STRBui:		case AArch64::STRBui:
case AArch64::STRBBui:		case AArch64::STRBBui:
Scale = Width = 1;		Scale = Width = 1;
break;		break;
}		}

		if (LdSt->getNumOperands() == 3) {
BaseReg = LdSt->getOperand(1).getReg();		BaseReg = LdSt->getOperand(1).getReg();
Offset = LdSt->getOperand(2).getImm() * Scale;		Offset = LdSt->getOperand(2).getImm() * Scale;
		} else {
		qcolombetUnsubmitted Done Reply Inline Actions s/else if/else/ And move the condition of the if in an assert. qcolombet: s/else if/else/ And move the condition of the if in an assert.
		assert(LdSt->getNumOperands() == 4 && "invalid number of operands");
		BaseReg = LdSt->getOperand(2).getReg();
		Offset = LdSt->getOperand(3).getImm() * Scale;
		}
return true;		return true;
}		}

/// Detect opportunities for ldp/stp formation.		/// Detect opportunities for ldp/stp formation.
///		///
/// Only called for LdSt for which getMemOpBaseRegImmOfs returns true.		/// Only called for LdSt for which getMemOpBaseRegImmOfs returns true.
bool AArch64InstrInfo::shouldClusterLoads(MachineInstr *FirstLdSt,		bool AArch64InstrInfo::shouldClusterLoads(MachineInstr *FirstLdSt,
MachineInstr *SecondLdSt,		MachineInstr *SecondLdSt,
▲ Show 20 Lines • Show All 1,618 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-a57-fp-load-balancing.ll

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	entry:
store double %add19, double* %arrayidx20, align 8		store double %add19, double* %arrayidx20, align 8
ret void		ret void
}		}

; Overlapping groups - coloring needed.		; Overlapping groups - coloring needed.

; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK-EVEN: fmadd [[x:d[0-9]*[02468]]]		; CHECK-EVEN: fmadd [[x:d[0-9]*[02468]]]
; CHECK-EVEN: fmul [[y:d[0-9]*[13579]]]
; CHECK-ODD: fmadd [[x:d[0-9]*[13579]]]		; CHECK-ODD: fmadd [[x:d[0-9]*[13579]]]
		; CHECK-A57: fmadd [[x]]
		; CHECK-EVEN: fmul [[y:d[0-9]*[13579]]]
; CHECK-ODD: fmul [[y:d[0-9]*[02468]]]		; CHECK-ODD: fmul [[y:d[0-9]*[02468]]]
; CHECK: fmadd [[x]]
; CHECK: fmadd [[y]]		; CHECK: fmadd [[y]]
		; CHECK-A53: fmadd [[x]]
		; CHECK-A53: fmadd [[y]]
; CHECK: fmsub [[x]]		; CHECK: fmsub [[x]]
; CHECK: fmadd [[y]]		; CHECK-A57: fmadd [[y]]
		; CHECK-A53-DAG: str [[y]]
; CHECK: fmadd [[x]]		; CHECK: fmadd [[x]]
; CHECK-A57: stp [[x]], [[y]]		; CHECK-A57: stp [[x]], [[y]]
; CHECK-A53-DAG: str [[x]]		; CHECK-A53-DAG: str [[x]]
; CHECK-A53-DAG: str [[y]]

define void @f2(double* nocapture readonly %p, double* nocapture %q) #0 {		define void @f2(double* nocapture readonly %p, double* nocapture %q) #0 {
entry:		entry:
%0 = load double, double* %p, align 8		%0 = load double, double* %p, align 8
%arrayidx1 = getelementptr inbounds double, double* %p, i64 1		%arrayidx1 = getelementptr inbounds double, double* %p, i64 1
%1 = load double, double* %arrayidx1, align 8		%1 = load double, double* %arrayidx1, align 8
%arrayidx2 = getelementptr inbounds double, double* %p, i64 2		%arrayidx2 = getelementptr inbounds double, double* %p, i64 2
%2 = load double, double* %arrayidx2, align 8		%2 = load double, double* %arrayidx2, align 8
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
}		}

declare void @g(...) #1		declare void @g(...) #1

; Single precision version of f2.		; Single precision version of f2.

; CHECK-LABEL: f4:		; CHECK-LABEL: f4:
; CHECK-EVEN: fmadd [[x:s[0-9]*[02468]]]		; CHECK-EVEN: fmadd [[x:s[0-9]*[02468]]]
; CHECK-EVEN: fmul [[y:s[0-9]*[13579]]]
; CHECK-ODD: fmadd [[x:s[0-9]*[13579]]]		; CHECK-ODD: fmadd [[x:s[0-9]*[13579]]]
		; CHECK-A57: fmadd [[x]]
		; CHECK-EVEN: fmul [[y:s[0-9]*[13579]]]
; CHECK-ODD: fmul [[y:s[0-9]*[02468]]]		; CHECK-ODD: fmul [[y:s[0-9]*[02468]]]
; CHECK: fmadd [[x]]
; CHECK: fmadd [[y]]		; CHECK: fmadd [[y]]
		; CHECK-A53: fmadd [[x]]
		; CHECK-A53: fmadd [[y]]
; CHECK: fmsub [[x]]		; CHECK: fmsub [[x]]
; CHECK: fmadd [[y]]		; CHECK-A57: fmadd [[y]]
		; CHECK-A53-DAG: str [[y]]
; CHECK: fmadd [[x]]		; CHECK: fmadd [[x]]
; CHECK-A57: stp [[x]], [[y]]		; CHECK-A57: stp [[x]], [[y]]
; CHECK-A53-DAG: str [[x]]		; CHECK-A53-DAG: str [[x]]
; CHECK-A53-DAG: str [[y]]

define void @f4(float* nocapture readonly %p, float* nocapture %q) #0 {		define void @f4(float* nocapture readonly %p, float* nocapture %q) #0 {
entry:		entry:
%0 = load float, float* %p, align 4		%0 = load float, float* %p, align 4
%arrayidx1 = getelementptr inbounds float, float* %p, i64 1		%arrayidx1 = getelementptr inbounds float, float* %p, i64 1
%1 = load float, float* %arrayidx1, align 4		%1 = load float, float* %arrayidx1, align 4
%arrayidx2 = getelementptr inbounds float, float* %p, i64 2		%arrayidx2 = getelementptr inbounds float, float* %p, i64 2
%2 = load float, float* %arrayidx2, align 4		%2 = load float, float* %arrayidx2, align 4
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-alloc-no-stack-realign.ll

	; RUN: llc < %s -mtriple=arm64-apple-darwin -enable-misched=false \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-darwin -enable-misched=false \| FileCheck %s

	; rdar://12713765			; rdar://12713765
	; Make sure we are not creating stack objects that are assumed to be 64-byte			; Make sure we are not creating stack objects that are assumed to be 64-byte
	; aligned.			; aligned.
	@T3_retval = common global <16 x float> zeroinitializer, align 16			@T3_retval = common global <16 x float> zeroinitializer, align 16

	define void @test(<16 x float>* noalias sret %agg.result) nounwind ssp {			define void @test(<16 x float>* noalias sret %agg.result) nounwind ssp {
	entry:			entry:
	; CHECK: test			; CHECK: test
	; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], [sp, #32]
	; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], [sp]			; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], [sp]
				; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], [sp, #32]
	; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], {{\[}}[[BASE:x[0-9]+]], #32]			; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], {{\[}}[[BASE:x[0-9]+]], #32]
	; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], {{\[}}[[BASE]]]			; CHECK: stp [[Q1:q[0-9]+]], [[Q2:q[0-9]+]], {{\[}}[[BASE]]]
	%retval = alloca <16 x float>, align 16			%retval = alloca <16 x float>, align 16
	%0 = load <16 x float>, <16 x float>* @T3_retval, align 16			%0 = load <16 x float>, <16 x float>* @T3_retval, align 16
	store <16 x float> %0, <16 x float>* %retval			store <16 x float> %0, <16 x float>* %retval
	%1 = load <16 x float>, <16 x float>* %retval			%1 = load <16 x float>, <16 x float>* %retval
	store <16 x float> %1, <16 x float>* %agg.result, align 16			store <16 x float> %1, <16 x float>* %agg.result, align 16
	ret void			ret void
	}			}

llvm/test/CodeGen/AArch64/fastcc.ll

Show All 14 Lines
; CHECK-TAIL: str w{{[0-9]+}}, [sp, #-32]!		; CHECK-TAIL: str w{{[0-9]+}}, [sp, #-32]!


call fastcc void @func_stack8([8 x i32] undef, i32 42)		call fastcc void @func_stack8([8 x i32] undef, i32 42)
; CHECK: bl func_stack8		; CHECK: bl func_stack8
; CHECK-NOT: sub sp, sp,		; CHECK-NOT: sub sp, sp,

; CHECK-TAIL: bl func_stack8		; CHECK-TAIL: bl func_stack8
; CHECK-TAIL: sub sp, sp, #16		; CHECK-TAIL: stp xzr, xzr, [sp, #-16]!


call fastcc void @func_stack32([8 x i32] undef, i128 0, i128 9)		call fastcc void @func_stack32([8 x i32] undef, i128 0, i128 9)
; CHECK: bl func_stack32		; CHECK: bl func_stack32
; CHECK-NOT: sub sp, sp,		; CHECK-NOT: sub sp, sp,


; CHECK-TAIL: bl func_stack32		; CHECK-TAIL: bl func_stack32
Show All 33 Lines	; CHECK-TAIL: str w{{[0-9]+}}, [sp, #-32]!


call fastcc void @func_stack8([8 x i32] undef, i32 42)		call fastcc void @func_stack8([8 x i32] undef, i32 42)
; CHECK: bl func_stack8		; CHECK: bl func_stack8
; CHECK-NOT: sub sp, sp,		; CHECK-NOT: sub sp, sp,


; CHECK-TAIL: bl func_stack8		; CHECK-TAIL: bl func_stack8
; CHECK-TAIL: sub sp, sp, #16		; CHECK-TAIL: stp xzr, xzr, [sp, #-16]!


call fastcc void @func_stack32([8 x i32] undef, i128 0, i128 9)		call fastcc void @func_stack32([8 x i32] undef, i128 0, i128 9)
; CHECK: bl func_stack32		; CHECK: bl func_stack32
; CHECK-NOT: sub sp, sp,		; CHECK-NOT: sub sp, sp,


; CHECK-TAIL: bl func_stack32		; CHECK-TAIL: bl func_stack32
Show All 26 Lines
; CHECK-TAIL: mov x29, sp		; CHECK-TAIL: mov x29, sp


call fastcc void @func_stack8([8 x i32] undef, i32 42)		call fastcc void @func_stack8([8 x i32] undef, i32 42)
; CHECK: bl func_stack8		; CHECK: bl func_stack8
; CHECK-NOT: sub sp, sp,		; CHECK-NOT: sub sp, sp,

; CHECK-TAIL: bl func_stack8		; CHECK-TAIL: bl func_stack8
; CHECK-TAIL: sub sp, sp, #16		; CHECK-TAIL: stp xzr, xzr, [sp, #-16]!


call fastcc void @func_stack32([8 x i32] undef, i128 0, i128 9)		call fastcc void @func_stack32([8 x i32] undef, i128 0, i128 9)
; CHECK: bl func_stack32		; CHECK: bl func_stack32
; CHECK-NOT: sub sp, sp,		; CHECK-NOT: sub sp, sp,


; CHECK-TAIL: bl func_stack32		; CHECK-TAIL: bl func_stack32
Show All 20 Lines

llvm/test/CodeGen/AArch64/ldst-pairing.ll

This file was added.

				; RUN: llc < %s -march=arm64 -mtriple=aarch64-none-linux-gnu \| FileCheck %s
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions You may need to disable the MI scheduler for this to work. I believe you can do this by specifying -disable-post-ra on the llc command line. mcrosier: You may need to disable the MI scheduler for this to work. I believe you can do this by…

				; st1 checks that the stores are paired when appearing in order %a, %b, %c, %d.
				;
				qcolombetUnsubmitted Done Reply Inline Actions CHECK-LABEL: + Give a description of what is tested. qcolombet: CHECK-LABEL: + Give a description of what is tested.
				; CHECK-LABEL: st1:
				qcolombetUnsubmitted Done Reply Inline Actions Could you be more specific? In particular, explain what is the difference between @st1, @st2, and so on. Put differently what is relevant in this test that was not previously tested. For instance, the FIXME in st5 is a good example. We check the merging of stores when the access are not in increasing order of the address location. qcolombet: Could you be more specific? In particular, explain what is the difference between @st1, @st2…
				; CHECK: stp
				; CHECK: stp
				define void @st1(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %d, <4 x float> * %base, i64 %index) {
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e., this passes without this patch). mcrosier: This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e.
				entry:
				%a1 = getelementptr inbounds <4 x float>, <4 x float>* %base, i64 %index
				%b1 = getelementptr <4 x float>, <4 x float>* %a1, i64 1
				%c1 = getelementptr <4 x float>, <4 x float>* %a1, i64 2
				%d1 = getelementptr <4 x float>, <4 x float>* %a1, i64 3

				store <4 x float> %a, <4 x float> * %a1
				store <4 x float> %b, <4 x float> * %b1
				store <4 x float> %c, <4 x float> * %c1
				store <4 x float> %d, <4 x float> * %d1

				ret void
				}
				qcolombetUnsubmitted Done Reply Inline Actions ditto qcolombet: ditto

				; st2 checks that the stores %c and %d are paired after the fadd instruction,
				; and then the stores %a and %d are paired after proving that they do not
				; depend on the the (%c, %d) pair.
				;
				; CHECK-LABEL: st2:
				; CHECK: stp
				; CHECK: stp
				define void @st2(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %d, float* %base, i64 %index) {
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions This is the most relevant test to the commit. To properly test the patch the pre-RA MI scheduler must be disabled (-enable-misched=false) and D17097 must also be applied. mcrosier: This is the most relevant test to the commit. To properly test the patch the pre-RA MI…
				entry:
				%a0 = getelementptr inbounds float, float* %base, i64 %index
				%b0 = getelementptr float, float* %a0, i64 4
				%c0 = getelementptr float, float* %a0, i64 8
				%d0 = getelementptr float, float* %a0, i64 12

				%a1 = bitcast float* %a0 to <4 x float>*
				%b1 = bitcast float* %b0 to <4 x float>*
				%c1 = bitcast float* %c0 to <4 x float>*
				%d1 = bitcast float* %d0 to <4 x float>*

				store <4 x float> %c, <4 x float> * %c1, align 4
				store <4 x float> %a, <4 x float> * %a1, align 4

				; This fadd forces the compiler to pair %c and %e after fadd, and leave the
				; stores %a and %b separated by a stp. The dependence analysis needs then to
				; prove that it is safe to move %b past the stp to be paired with %a.
				%e = fadd fast <4 x float> %d, %a

				qcolombetUnsubmitted Done Reply Inline Actions ditto qcolombet: ditto
				store <4 x float> %e, <4 x float> * %d1, align 4
				store <4 x float> %b, <4 x float> * %b1, align 4

				ret void
				}

				; st3 checks that the stores are paired when they appear interleaved and in
				; reverse order.
				;
				; CHECK-LABEL: st3:
				; CHECK: stp
				; CHECK: stp
				define void @st3(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %d, <4 x float> * %base, i64 %index) {
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Same story at st1. This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e., this passes without this patch). mcrosier: Same story at st1. This tests the most basic pairing capabilities and is not relevant to the…
				entry:
				%a1 = getelementptr inbounds <4 x float>, <4 x float>* %base, i64 %index
				%b1 = getelementptr <4 x float>, <4 x float>* %a1, i64 1
				%c1 = getelementptr <4 x float>, <4 x float>* %a1, i64 2
				%d1 = getelementptr <4 x float>, <4 x float>* %a1, i64 3

				store <4 x float> %a, <4 x float> * %a1
				store <4 x float> %d, <4 x float> * %d1
				store <4 x float> %c, <4 x float> * %c1
				store <4 x float> %b, <4 x float> * %b1

				ret void
				}

				; st4 is like st2 except that the stores are in reverse order.
				; CHECK-LABEL: st4:
				; CHECK: stp
				; CHECK: stp
				define void @st4(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %d, <4 x float> * %base, i64 %index) {
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions This is redundant and not needed. mcrosier: This is redundant and not needed.
				entry:
				%a1 = getelementptr inbounds <4 x float>, <4 x float>* %base, i64 %index
				%b1 = getelementptr <4 x float>, <4 x float>* %a1, i64 1
				%c1 = getelementptr <4 x float>, <4 x float>* %a1, i64 2
				%d1 = getelementptr <4 x float>, <4 x float>* %a1, i64 3

				store <4 x float> %d, <4 x float> * %d1
				store <4 x float> %b, <4 x float> * %b1

				%e = fadd fast <4 x float> %c, %a

				store <4 x float> %e, <4 x float> * %c1
				store <4 x float> %a, <4 x float> * %a1

				ret void
				}

				; FIXME: st5 should contain two stp. The current pairing algorithm is greedy and
				; is pairing %c and %b, leaving the two other stores %a and %d non pairable.
				; CHECK-LABEL: st5:
				; CHECK: stp
				define void @st5(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %d, <4 x float> * %base, i64 %index) {
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions This test is not relevant to this patch. It can be committed separately with a FIXME comment in AArch64LoadStoreOptimizer.cpp. mcrosier: This test is not relevant to this patch. It can be committed separately with a FIXME comment…
				entry:
				%a1 = getelementptr inbounds <4 x float>, <4 x float>* %base, i64 %index
				%b1 = getelementptr <4 x float>, <4 x float>* %a1, i64 1
				%c1 = getelementptr <4 x float>, <4 x float>* %a1, i64 2
				%d1 = getelementptr <4 x float>, <4 x float>* %a1, i64 3

				store <4 x float> %c, <4 x float> * %c1
				store <4 x float> %a, <4 x float> * %a1
				store <4 x float> %b, <4 x float> * %b1
				store <4 x float> %d, <4 x float> * %d1

				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] analyse dependences of ldp/stpClosedPublic

Details

Diff Detail

Event Timeline