This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] analyse dependences of ldp/stp
ClosedPublic

Authored by mcrosier on Feb 10 2016, 2:09 PM.

Details

Summary

Add load and store pairs to the dependence analysis of offset and width.
This fixes PR26358.

Patch by Sebastian Pop and Abderrazek Zaafrani.

Diff Detail

Event Timeline

sebpop updated this revision to Diff 47519.Feb 10 2016, 2:09 PM
sebpop retitled this revision from to [AArch64] analyse dependences of ldp/stp.
sebpop updated this object.
sebpop added reviewers: rengolin, kristof.beyls.
sebpop added a subscriber: llvm-commits.
az added a subscriber: az.Feb 10 2016, 2:13 PM
qcolombet added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1456

s/else if/else/
And move the condition of the if in an assert.

llvm/test/CodeGen/AArch64/ldst-pairing.ll
4

CHECK-LABEL:

+ Give a description of what is tested.

21

ditto

49

ditto

sebpop updated this revision to Diff 47543.Feb 10 2016, 3:31 PM

Address comments from Quentin.

qcolombet added inline comments.Feb 10 2016, 3:44 PM
llvm/test/CodeGen/AArch64/ldst-pairing.ll
5

Could you be more specific?
In particular, explain what is the difference between @st1, @st2, and so on.
Put differently what is relevant in this test that was not previously tested.

For instance, the FIXME in st5 is a good example. We check the merging of stores when the access are not in increasing order of the address location.

Please add the non-temporal ldp/stp instructions. This should probably land after D17097.

sebpop updated this revision to Diff 47602.Feb 10 2016, 11:33 PM
sebpop updated this object.

Add nontemporal, update testcase comments.

sebpop marked 5 inline comments as done.Feb 10 2016, 11:34 PM
mcrosier requested changes to this revision.Feb 17 2016, 7:20 AM
mcrosier added a reviewer: mcrosier.

All of the provided tests pass on current top of trunk (i.e., without this patch). We'll need some tests that exercise the patch before it can be committed.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1346

I think the 'or' is misleading as we're looking for a base register followed by an immediate offset in both the non-paired and paired cases.

We're looking for 'ldr x1, [x0, #imm]' or 'ldp x1, x2, [x0, #imm]'

Therefore, I'd suggest leaving this comment as is and add comments below.

1348

// Non-paired instruction (e.g., ldr x1, [x0, #8]).

1351

// Paired instruction (e.g., ldp x1, x2, [x0, #8]).

This revision now requires changes to proceed.Feb 17 2016, 7:20 AM
mcrosier added inline comments.Feb 17 2016, 7:30 AM
llvm/test/CodeGen/AArch64/ldst-pairing.ll
1

You may need to disable the MI scheduler for this to work. I believe you can do this by specifying -disable-post-ra on the llc command line.

mcrosier accepted this revision.Mar 7 2016, 10:26 AM
mcrosier edited edge metadata.

LGTM, assuming there are no correctness issues.

Please only commit the st2 test as the other tests are either not related to the patch our are redundant with st2. Also, the test can be included in arm64-stp-aa.ll, rather than creating a new file.

llvm/test/CodeGen/AArch64/ldst-pairing.ll
8

This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e., this passes without this patch).

30

This is the most relevant test to the commit. To properly test the patch the pre-RA MI scheduler must be disabled (-enable-misched=false) and D17097 must also be applied.

62

Same story at st1. This tests the most basic pairing capabilities and is not relevant to the patch at hand (i.e., this passes without this patch).

81

This is redundant and not needed.

103

This test is not relevant to this patch. It can be committed separately with a FIXME comment in AArch64LoadStoreOptimizer.cpp.

This revision is now accepted and ready to land.Mar 7 2016, 10:26 AM
mcrosier requested changes to this revision.Mar 8 2016, 7:30 AM
mcrosier edited edge metadata.

I wanted to push this patch along, so I ran some correctness tests. It turns out the paired instructions aren't being scaled properly (see inline comments). After addressing those issues I'm now seeing a few regressions due to fewer stp instructions. I'm going to block this patch while I investigate.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1405

Scale = 16;
Width = 32;

1414

Scale = 8;
Width = 16;

1426

Scale = 4;
Width = 8;

This revision now requires changes to proceed.Mar 8 2016, 7:30 AM
mcrosier commandeered this revision.Mar 8 2016, 11:57 AM
mcrosier edited reviewers, added: sebpop; removed: mcrosier.
mcrosier updated this revision to Diff 50064.Mar 8 2016, 12:05 PM

Correct scale/offset issues and address all previous feedback.

sebpop edited edge metadata.Mar 8 2016, 5:58 PM

Correct scale/offset issues and address all previous feedback.

The corrected version looks very similar to what I remember having implemented first, and then we found out that the scale did not match.
Let's not apply this patch yet: I need to discuss this again with my colleague Abderrazek and we will try to find why we changed to scale = width for ldp/stp formation. We will come back with a counter-example if we find one.

Correct scale/offset issues and address all previous feedback.

The corrected version looks very similar to what I remember having implemented first, and then we found out that the scale did not match.
Let's not apply this patch yet: I need to discuss this again with my colleague Abderrazek and we will try to find why we changed to scale = width for ldp/stp formation. We will come back with a counter-example if we find one.

Sounds good. Let me know if I've missed something. The matching scale and offsets caused a large number of correctness failures in the test-suite as well as SPEC200X in my testing.

Let me know if I've missed something. The matching scale and offsets caused a large number of correctness failures in the test-suite as well as SPEC200X in my testing.

Chad, your changes to the patch are correct: the ARM documentation states that in the case of a pair of 32bit regs the immediate byte offset is a multiple of 4.
The patch LGTM.

Ping.
Abderrazek has looked at the code produced with the patch on xalan benchmark
and he found that with the patch there are 13 more stp, and not fewer stp.
Ok to commit?

sebpop added a comment.Apr 7 2016, 2:40 PM

Chad, is it ok to commit this patch?

Please rebase it. LGTM.

lib/Target/AArch64/AArch64InstrInfo.cpp
1351 ↗(On Diff #50064)

Please use getNumExplicitOperands().

mcrosier accepted this revision.Apr 15 2016, 9:10 AM
mcrosier added a reviewer: mcrosier.

Please rebase it. LGTM.

Approved per Jun and Sebastian. Jun verified the regression I was seeing no longer exists. I'll go ahead and rebase the patch and commit with Jun's suggestion to use getNumExplicitOperands(). Thanks, Jun/Sebastian.

This revision is now accepted and ready to land.Apr 15 2016, 9:10 AM
mcrosier closed this revision.Apr 15 2016, 11:19 AM

Committed in r266462.