This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix over-eager early-exit in load-store combiner
ClosedPublic

Authored by niravd on Jan 3 2017, 12:59 PM.

Details

Summary

Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.

Event Timeline

niravd updated this revision to Diff 82938.Jan 3 2017, 12:59 PM
niravd retitled this revision from to [AArch64] Fix over-eager early-exit in load-store combiner.
niravd added reviewers: mcrosier, t.p.northover.
niravd updated this object.
niravd added a subscriber: llvm-commits.
niravd added a comment.Jan 4 2017, 9:48 AM

To add some context, the behavior can be seen by a change in behavior of "merge_zr32_2_offset" which with upstream combiner-alias-analysis places the two stores in descending order which the load/store optimizer fails to catch.

The two changed test cases are because the current store merging analysis fails to not merge some of the stores in the these test cases. This is resolved by D14834.

junbuml added inline comments.Jan 4 2017, 10:07 AM
lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1473–1474

In which case, can you see better st/ld pairs with this change ? Can you add a case if you have? Also, is there a case on which this change cause worse combinations of st/ld pairs?

In some case, this change may unnecessarily call findMatchingInsn() :
e.g.,

STR %XZR, %X0, 64  <--- we don't even need to try to look up pair for this
STR %XZR, %X0, 65
test/CodeGen/AArch64/ldst-opt.ll
1418–1421

For me, this doesn't seem to be related with your change in AArch64LoadStoreOptimizer.cpp

junbuml edited edge metadata.Jan 4 2017, 10:26 AM

Looks like this change catch the case which happen in merge_zr32_2_offset() :

STRXui %XZR, %X0, 64; 
STRXui %XZR, %X0, 63;

as we perform only forward search when finding matching instruction.

mcrosier edited edge metadata.Jan 4 2017, 11:31 AM

I think this test case exposes the underlying issue:

define i64 @test(i64* %a) nounwind {
  %p1 = getelementptr inbounds i64, i64* %a, i32 64
  %tmp1 = load i64, i64* %p1, align 2
  %p2 = getelementptr inbounds i64, i64* %a, i32 63
  %tmp2 = load i64, i64* %p2, align 2
  %tmp3 = add i64 %tmp1, %tmp2
  ret i64 %tmp3
}

When you run llc make sure to disable the scheduler so that the instructions aren't reordered by using -enable-misched=false and -enable-post-misched=false.

Reproduce with:

llc -mtriple=aarch64-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -disable-lsr -verify-machineinstrs -enable-misched=false -enable-post-misched=false -o - test.ll

If I'm not mistaken, without this patch we won't pair the instructions.

If Jun doesn't oppose, this change LGTM. However, before committing please rebase the patch with a more straight forward test case (i.e., see my previous comment).

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1473–1474

In my limited testing (i.e., SPEC2006), this strictly increased the number of load/store pairs.

mcrosier accepted this revision.Jan 4 2017, 11:46 AM
mcrosier edited edge metadata.
This revision is now accepted and ready to land.Jan 4 2017, 11:46 AM

This change LGTM. As Chad mention, it will be good to add a clear test case which specially shows the case handled by this change like the one from Chad's comment.

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1474

It will be good to have a comment explaining the case handled by doing this.

This revision was automatically updated to reflect the committed changes.