Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
To add some context, the behavior can be seen by a change in behavior of "merge_zr32_2_offset" which with upstream combiner-alias-analysis places the two stores in descending order which the load/store optimizer fails to catch.
The two changed test cases are because the current store merging analysis fails to not merge some of the stores in the these test cases. This is resolved by D14834.
lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp | ||
---|---|---|
1473–1474 ↗ | (On Diff #82938) | In which case, can you see better st/ld pairs with this change ? Can you add a case if you have? Also, is there a case on which this change cause worse combinations of st/ld pairs? In some case, this change may unnecessarily call findMatchingInsn() : STR %XZR, %X0, 64 <--- we don't even need to try to look up pair for this STR %XZR, %X0, 65 |
test/CodeGen/AArch64/ldst-opt.ll | ||
1418–1421 ↗ | (On Diff #82938) | For me, this doesn't seem to be related with your change in AArch64LoadStoreOptimizer.cpp |
Looks like this change catch the case which happen in merge_zr32_2_offset() :
STRXui %XZR, %X0, 64; STRXui %XZR, %X0, 63;
as we perform only forward search when finding matching instruction.
I think this test case exposes the underlying issue:
define i64 @test(i64* %a) nounwind { %p1 = getelementptr inbounds i64, i64* %a, i32 64 %tmp1 = load i64, i64* %p1, align 2 %p2 = getelementptr inbounds i64, i64* %a, i32 63 %tmp2 = load i64, i64* %p2, align 2 %tmp3 = add i64 %tmp1, %tmp2 ret i64 %tmp3 }
When you run llc make sure to disable the scheduler so that the instructions aren't reordered by using -enable-misched=false and -enable-post-misched=false.
Reproduce with:
llc -mtriple=aarch64-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -disable-lsr -verify-machineinstrs -enable-misched=false -enable-post-misched=false -o - test.ll
If I'm not mistaken, without this patch we won't pair the instructions.
If Jun doesn't oppose, this change LGTM. However, before committing please rebase the patch with a more straight forward test case (i.e., see my previous comment).
lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp | ||
---|---|---|
1473–1474 ↗ | (On Diff #82938) | In my limited testing (i.e., SPEC2006), this strictly increased the number of load/store pairs. |
This change LGTM. As Chad mention, it will be good to add a clear test case which specially shows the case handled by this change like the one from Chad's comment.
lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp | ||
---|---|---|
1474 ↗ | (On Diff #82938) | It will be good to have a comment explaining the case handled by doing this. |