This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Backedge indexing
AbandonedPublic

Authored by SjoerdMeijer on Oct 21 2020, 9:19 AM.

Details

Summary

Bit of a brain dump because I was seeing the same problems with addressing modes in unrolled loops and is completely related to what @SjoerdMeijer is currently working on in D89693 and I doubt I will have time to look more into this...

For the benchmark that I am looking at, the total size shrinks, but there seems to be a problem because we no longer generate the LDPs, (which I presume this is just a current limitation of the AArch64LoadStoreOptimizer?):

< 	ldp	q0, q2, [x2, #-16]
< 	ldp	q1, q3, [x4, #-16]
< 	subs	x5, x5, #8                      // =8
< 	add	x4, x4, #32                     // =32
< 	add	x2, x2, #32                     // =32
< 	fmul	v0.4s, v0.4s, v1.4s
< 	fmul	v2.4s, v2.4s, v3.4s
< 	ldp	q1, q3, [x3, #-16]
< 	fadd	v0.4s, v1.4s, v0.4s
< 	fadd	v1.4s, v3.4s, v2.4s
< 	stp	q0, q1, [x3, #-16]
< 	add	x3, x3, #32                     // =32
---
> 	ldr	q0, [x5, #32]!
> 	subs	x27, x27, #8                    // =8
> 	ldur	q1, [x5, #-16]
> 	ldr	q2, [x7, #32]!
> 	ldur	q3, [x7, #-16]
> 	ldr	q4, [x6, #32]!
> 	fmul	v0.4s, v0.4s, v2.4s
> 	fmul	v1.4s, v1.4s, v3.4s
> 	ldr	q2, [x6, #16]
> 	fadd	v1.4s, v4.4s, v1.4s
> 	fadd	v0.4s, v2.4s, v0.4s
> 	stp	q1, q0, [x6]

Diff Detail

Event Timeline

samparker created this revision.Oct 21 2020, 9:19 AM
samparker requested review of this revision.Oct 21 2020, 9:19 AM
SjoerdMeijer commandeered this revision.EditedMar 1 2021, 8:35 AM
SjoerdMeijer abandoned this revision.
SjoerdMeijer edited reviewers, added: samparker; removed: SjoerdMeijer.

I am abandoning this in favour of D89693, which I have repurposed to address this, because most of the discussions happened there.