This is an archive of the discontinued LLVM Phabricator instance.

To add some context, the behavior can be seen by a change in behavior of "merge_zr32_2_offset" which with upstream combiner-alias-analysis places the two stores in descending order which the load/store optimizer fails to catch.

The two changed test cases are because the current store merging analysis fails to not merge some of the stores in the these test cases. This is resolved by D14834.

junbuml added inline comments.Jan 4 2017, 10:07 AM

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1473–1474 ↗	(On Diff #82938)	In which case, can you see better st/ld pairs with this change ? Can you add a case if you have? Also, is there a case on which this change cause worse combinations of st/ld pairs? In some case, this change may unnecessarily call findMatchingInsn() : e.g., STR %XZR, %X0, 64 <--- we don't even need to try to look up pair for this STR %XZR, %X0, 65
test/CodeGen/AArch64/ldst-opt.ll
1418–1421 ↗	(On Diff #82938)	For me, this doesn't seem to be related with your change in AArch64LoadStoreOptimizer.cpp

Looks like this change catch the case which happen in merge_zr32_2_offset() :

STRXui %XZR, %X0, 64; 
STRXui %XZR, %X0, 63;

as we perform only forward search when finding matching instruction.

I think this test case exposes the underlying issue:

define i64 @test(i64* %a) nounwind {
  %p1 = getelementptr inbounds i64, i64* %a, i32 64
  %tmp1 = load i64, i64* %p1, align 2
  %p2 = getelementptr inbounds i64, i64* %a, i32 63
  %tmp2 = load i64, i64* %p2, align 2
  %tmp3 = add i64 %tmp1, %tmp2
  ret i64 %tmp3
}

When you run llc make sure to disable the scheduler so that the instructions aren't reordered by using -enable-misched=false and -enable-post-misched=false.

Reproduce with:

llc -mtriple=aarch64-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -disable-lsr -verify-machineinstrs -enable-misched=false -enable-post-misched=false -o - test.ll

If I'm not mistaken, without this patch we won't pair the instructions.

If Jun doesn't oppose, this change LGTM. However, before committing please rebase the patch with a more straight forward test case (i.e., see my previous comment).

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1473–1474 ↗	(On Diff #82938)	In my limited testing (i.e., SPEC2006), this strictly increased the number of load/store pairs.

mcrosier accepted this revision.Jan 4 2017, 11:46 AM

mcrosier edited edge metadata.

This revision is now accepted and ready to land.Jan 4 2017, 11:46 AM

This change LGTM. As Chad mention, it will be good to add a clear test case which specially shows the case handled by this change like the one from Chad's comment.

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1474 ↗	(On Diff #82938)	It will be good to have a comment explaining the case handled by doing this.

Closed by commit rL291008: [AArch64] Fix over-eager early-exit in load-store combiner (authored by niravd). · Explain WhyJan 4 2017, 1:32 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64LoadStoreOptimizer.cpp

3 lines

test/

CodeGen/

AArch64/

store_merge_pair_offset.ll

12 lines

Diff 83119

llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 1,464 Lines • ▼ Show 20 Lines	if (!TII->isCandidateToMergeOrPair(MI))
return false;		return false;

// Early exit if the offset is not possible to match. (6 bits of positive		// Early exit if the offset is not possible to match. (6 bits of positive
// range, plus allow an extra one in case we find a later insn that matches		// range, plus allow an extra one in case we find a later insn that matches
// with Offset-1)		// with Offset-1)
bool IsUnscaled = TII->isUnscaledLdSt(MI);		bool IsUnscaled = TII->isUnscaledLdSt(MI);
int Offset = getLdStOffsetOp(MI).getImm();		int Offset = getLdStOffsetOp(MI).getImm();
int OffsetStride = IsUnscaled ? getMemScale(MI) : 1;		int OffsetStride = IsUnscaled ? getMemScale(MI) : 1;
		// Allow one more for offset.
		if (Offset > 0)
		Offset -= OffsetStride;
if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride))		if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride))
return false;		return false;

// Look ahead up to LdStLimit instructions for a pairable instruction.		// Look ahead up to LdStLimit instructions for a pairable instruction.
LdStPairFlags Flags;		LdStPairFlags Flags;
MachineBasicBlock::iterator Paired =		MachineBasicBlock::iterator Paired =
findMatchingInsn(MBBI, Flags, LdStLimit, /* FindNarrowMerge = */ false);		findMatchingInsn(MBBI, Flags, LdStLimit, /* FindNarrowMerge = */ false);
if (Paired != E) {		if (Paired != E) {
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/store_merge_pair_offset.ll

				; RUN: llc -mtriple=aarch64-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -disable-lsr -verify-machineinstrs -enable-misched=false -enable-post-misched=false -o - %s \| FileCheck %s

				define i64 @test(i64* %a) nounwind {
				; CHECK: ldp x{{[0-9]+}}, x{{[0-9]+}}
				; CHECK-NOT: ldr
				%p1 = getelementptr inbounds i64, i64* %a, i32 64
				%tmp1 = load i64, i64* %p1, align 2
				%p2 = getelementptr inbounds i64, i64* %a, i32 63
				%tmp2 = load i64, i64* %p2, align 2
				%tmp3 = add i64 %tmp1, %tmp2
				ret i64 %tmp3
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix over-eager early-exit in load-store combinerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83119

llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

llvm/trunk/test/CodeGen/AArch64/store_merge_pair_offset.ll

[AArch64] Fix over-eager early-exit in load-store combiner
ClosedPublic