This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
14/19
AArch64InstrInfo.cpp
10/11
AArch64LoadStoreOptimizer.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/2
arm64-memset-inline.ll
5/7
ldrpre-ldr-merge.mir
6/6
strpre-str-merge.mir

Differential D99272

[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.
ClosedPublic

Authored by stelios-arm on Mar 24 2021, 8:50 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
sanwou01
samparker
fhahn
NickGuy

Commits

rG936c777e2bf8: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.

Summary

This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.

For each pair, there is a MIR test that verifies this optimization.

This was a missed opportunity in the AArch64 load/store optimiser for cases like this:

#define float32_t float
#define uint32_t unsigned

void test(float32_t * S, float32_t * D, uint32_t N) {
  for (uint32_t i = 0; i < N; i++) {
    D[i] = D[i] + S[i];
  }
}

When compiled with:

-Ofast -target aarch64-arm-none-eabi -mcpu=cortex-a55 -mllvm -lsr-preferred-addressing-mode=preindexed

It results in:

.LBB0_9:                                // =>This Inner Loop Header: Depth=1
        ldr     q0, [x11, #32]!
        ldr     q1, [x11, #16]
        subs    x12, x12, #8                    // =8
        ldr     q2, [x10, #32]!
        ldr     q3, [x10, #16]
        fadd    v0.4s, v2.4s, v0.4s
        fadd    v1.4s, v3.4s, v1.4s
        stp     q0, q1, [x11]
        b.ne    .LBB0_9

where:

ldr     q0, [x11, #32]!
ldr     q1, [x11, #16]

should be:

ldp	q0, q1, [x11, #32]!

Additionally for cases like:

define <4 x i32>* @strqpre-strqui-merge(<4 x i32>* %p, <4 x i32> %a, <4 x i32> %b) {
entry:
  %p0 = getelementptr <4 x i32>, <4 x i32>* %p, i32 2
  store <4 x i32> %a, <4 x i32>* %p0
  %p1 = getelementptr <4 x i32>, <4 x i32>* %p, i32 3
  store <4 x i32> %b, <4 x i32>* %p1
  ret <4 x i32>* %p0
}

It results in:

"strqpre-strqui-merge":                 // @strqpre-strqui-merge
        str     q0, [x0, #32]!
        str     q1, [x0, #16]
        ret

where the store instruction should be merged with:

stp	q0, q1, [x0, #32]!

This patch covers such cases including the various forms of STR<>pre/LDR<>pre.

Diff Detail

Event Timeline

stelios-arm created this revision.Mar 24 2021, 8:50 AM

Herald added subscribers: danielkiss, arphaman, hiraditya, kristof.beyls. · View Herald TranscriptMar 24 2021, 8:50 AM

stelios-arm requested review of this revision.Mar 24 2021, 8:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 8:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

stelios-arm added inline comments.Mar 24 2021, 9:07 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2233	For `LDRQpre` , `MI.hasOrderedMemoryRef()`results to true because the instruction has no memory reference information, and conservatively assumes it wasn't preserved. Therefore, I added: && MI.getOpcode() != AArch64::LDRQpre to ignore it for this instruction. I suppose there is a better way of doing it, but I am not yet sure how.
2277	Note for `LDRQpre` instructions it should be `MI.getOperand(2).getReg()`, and also `BaseReg` should be `Register BaseReg = MI.getOperand(2).getReg()`. This can be easily fixed, however it won’t do much because `MI.modifiesRegister(BaseReg, TRI)` will again result to true. Any suggestions?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1068	Mis-indentation. I am going to fix this is in an updated revision.
1352	Ditto.

Hi Stelios, many thanks for putting this together, good stuff.
I will do a code-review a bit later, but as there's potential for some corner cases here, first a testing question. Did you do a bootstrap build and e.g. ran the llvm test suite?

Harbormaster completed remote builds in B95506: Diff 333004.Mar 24 2021, 4:17 PM

dmgreen added inline comments.Mar 25 2021, 5:23 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2106	We should try and add all the various forms of STR?pre/LDR?pre. Hopefully they all work the same way, with the same operands.
2233	Why does the LDRQpre have no memory operand?
2264	-> operand Formatting looks a bit off, which might or might not be fixed by just running clang-format on the patch.
llvm/test/CodeGen/AArch64/strqpre-strqui-merge.mir
64 ↗	(On Diff #333004)	This should presumably be the before-STPQpre code, that is then converted to a STPQpre by the pass. You can often remove a lot of the stuff above, like the frameInfo and all the regBankSelected stuff. And there is an update_mir_test_checks for generating check lines.

Added all the various forms of STR<>pre/LDR<>pre.
Added additional test cases for the MIR tests to cover the various forms of STR<>pre/LDR<>pre.
Added constraints so that it optimizes cases where the offset of the second LDR/STR<>ui is equal to the size of the destination register. Additionally, it only optimizes cases where the base register of the pre-index LDR/STRpre<> is not used or modified.
Did a bootstrap build and ran the llvm test-suite on an AArch64 machine. Both the test-suite and regression tests results in no errors.
Currently there is a hack to avoid the memoperands_empty() check for LDR<>pre instructions. This is because they are missing the load memory operand. See below:

early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20

instead it should look something similar to:

early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20 :: (load 4)

This is going to be addressed in another patch, and then this patch will be updated to remove the hack that is in place.

In D99272#2647947, @SjoerdMeijer wrote:

Hi Stelios, many thanks for putting this together, good stuff.
I will do a code-review a bit later, but as there's potential for some corner cases here, first a testing question. Did you do a bootstrap build and e.g. ran the llvm test suite?

This was done for the second revision.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2233	I added an explanation in the new revision (Point 5). This is going to be addressed in another patch and this patch will be updated accordingly.

stelios-arm added inline comments.Apr 9 2021, 4:50 AM

llvm/test/CodeGen/AArch64/arm64-memset-inline.ll

In case you are wondering why with the new patch this is changed to and STP here is the full check commands pre-patch:

; CHECK-LABEL: bzero_8_stack:
; CHECK:       // %bb.0:
; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT:    .cfi_def_cfa_offset 16
; CHECK-NEXT:    .cfi_offset w30, -16
; CHECK-NEXT:    add x0, sp, #8 // =8
; CHECK-NEXT:    str xzr, [sp, #8]
; CHECK-NEXT:    bl something
; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT:    ret

233

Similarly:

; CHECK-LABEL: memset_8_stack:
; CHECK:       // %bb.0:
; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT:    .cfi_def_cfa_offset 16
; CHECK-NEXT:    .cfi_offset w30, -16
; CHECK-NEXT:    mov x8, #-6148914691236517206
; CHECK-NEXT:    add x0, sp, #8 // =8
; CHECK-NEXT:    str x8, [sp, #8]
; CHECK-NEXT:    bl something
; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT:    ret

Harbormaster completed remote builds in B97938: Diff 336390.Apr 9 2021, 5:37 AM

SjoerdMeijer added inline comments.Apr 9 2021, 11:12 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2253	Just make this dependent on D100215 and modify the code here accordingly. I guess that simply means removing the FIXME and the IsPreLD check.

SjoerdMeijer mentioned this in D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.Apr 12 2021, 2:15 AM

Matt added a subscriber: Matt.Apr 12 2021, 5:25 AM

Removed the hack that was used to avoid the memoperands_empty() check for LDR<>pre instructions.

Harbormaster completed remote builds in B98449: Diff 337087.Apr 13 2021, 3:59 AM

It would be good to see some extra tests for various edge cases, like offsets near to the boundaries and different pairs of instructions being combined/not.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1972	I feel like "Unscaled" instructions are a set of instructions on their own right. Can we rename the function to something like hasUnscaledLdStOffset to make it clear what it means now?
2277	The IsPreLd can move to the outer if, and can be IsPreSt too? The register it is using isn't correct, but it's unpredictable for a writeback load/store to load/store the same register as the operand. So the check is OK to skip for preinc loads/stores.
2821	Should this get some updates to make it more precise for pre-inc pairs?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
572	This is only used in one place, where it could be checking isPreLdSt on the original opcode?
675	+= 1?
1653	-> Additionally -> operations
1673	Why does this not already handle the combining of PreLdStPair? The existing code can combine in both directions. Presumably it's only valid for forward now?

stelios-arm marked 5 inline comments as done.Apr 16 2021, 8:54 AM

stelios-arm added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2821	Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset` returns `true`. However, `getMemOperandWithOffset` has a call to `getMemOpInfo`: // If this returns false, then it's an instruction we don't want to handle. if (!getMemOpInfo(LdSt.getOpcode(), Scale, Width, Dummy1, Dummy2)) return false; Currently, the `getMemOpInfo` does not include the pre-inc instructions. Additionally, for the post-inc instructions, there is only one variant available for `STRWpost`/`LDRWpost`. I am not sure why the other variants are not included, so I am bit skeptical If the pre-inc instructions should be added. What do you think?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1673	For example, the following instructions: str w1 [x0, #20]! str w2 [x0, #4] Can be paired to: Stp w1, w2, [x0, #20]! The offset of the first and second instruction is `20` and `4`, respectively. The offset stride is `4`. Therefore, the check `(Offset == MIOffset + OffsetStride)` and `(Offset + OffsetStride == MIOffset)` will return `false`. That’s is why it’s needed. And yes, for such cases it’s only valid for forward now, since the order of the instructions matters for this optimization.

Added more test cases.
Addressed the remarks.

Harbormaster completed remote builds in B99197: Diff 338139.Apr 16 2021, 10:59 AM

I presume that everything that uses hasUnscaledLdStOffset is still OK? It would either handle pre loads/stores or not reach them as pre loads/stores?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2235	Can these use isPreLd?
2260	Is this assert valid/sensible for PreInc?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1062	Does this need the second condition in the if? Should that always be true if the first part is a pre?
1673	OK, but it looks like the existing `(Offset == MIOffset + OffsetStride)` conditions could be true for preinc where we don't want them to be. Can we switch it around to be something like: bools IsPreLdSt = isPreLdStPairCandidate(..) if (!IsPreLdSt) { check conditions else continue } else { check pre conditions else continue } That way we don't need the extra indenting, and the conditions don't get muddled together.
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
401	Comments can be added here with `;`
llvm/test/CodeGen/AArch64/strpre-str-merge.mir
268	-257? If the strp has to be aligned, is it worth adding a test for +/-260 too?
329	store 4
358	What happens if this is 16? (Or 4)
390	I tried to come up with a list of tests. You have most of the covered, I also came up with these, some of which might be good to make sure are covered: Given ldrqpre a, [b, c] and ldrqui d, [e, f] q with d load with store a == b? No sure what happens then b != e. That's probably tested naturally anyway. second instruction is ldruqui. There were some others but they sound less useful.

Addressed the comments made by @dmgreen (Thanks for the comments!)

Added more test cases
Refactoring
Added support for LDUR<>i<>Ri/STUR<>i
Changed the code so that the same Pre Ld/St Opcodes are not candidates to merge/pair.

stelios-arm added inline comments.Apr 22 2021, 7:15 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2260	The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register.
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1062	If it reaches at this point, the second condition is redundant. Therefore, I will remove it.
llvm/test/CodeGen/AArch64/strpre-str-merge.mir
358	It exhibits the same behaviour.
358	They will not be merged.

Harbormaster completed remote builds in B100263: Diff 339614.Apr 22 2021, 9:04 AM

dmgreen added inline comments.Apr 26 2021, 4:27 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2260	Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex. With Preinc, that will be shifted one operand. Can we make sure it checks the correct operand for those too?
2821	We may want to make it more precise, but for the moment it's probably fine so long as it's not going to get anything drastically wrong. It clusters memory during scheduling to potentially increase the number of combined loads later on.
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1362	Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to check."
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
447	store 8
616	According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP. So 1024 is the first multiple of 16 that should be invalid.

dmgreen added inline comments.Apr 26 2021, 5:22 AM

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
18	Although I don't think it's an issue with this patch exactly, should this not have two MMO's? Either be :: (load 8) or :: (load 4) (load 4) ?

stelios-arm marked 4 inline comments as done.Apr 30 2021, 4:11 AM

stelios-arm added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2260	Oh yes, sure!
2821	I agree, maybe that's the next step.
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
18	Yes, it should have "(load 4), (load 4)" instead.
616	According to this, the offset of LDR<>pre/STR<>pre is in range `[-256,255]`. Assuming that `x∈[-256,255]` and `c = size of <S,D,Q,W,X>` , for this type of optimization the resulted LDP<>pre/STP<>pre will be in range of `[min(x mod c == 0), max(x mod c == 0)]`.

Addressed the comments made by @dmgreen.

Harbormaster completed remote builds in B101883: Diff 341838.Apr 30 2021, 4:47 AM

Thanks for adding all these tests. LGTM.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2265	I think it's probably OK to check assert((MI.getOperand(IsPreLdSt ? 2 : 1).isReg() \|\| MI.getOperand(IsPreLdSt ? 2 : 1).isFI())...
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
18	OK. It's because they are identical, so are elided during the merging of memory operands. It's a little strange to not see both, but fine. You don't get any extra information out of having both in this case.

This revision is now accepted and ready to land.Apr 30 2021, 6:07 AM

Closed by commit rG936c777e2bf8: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR. (authored by stelios-arm). · Explain WhyApr 30 2021, 9:30 AM

This revision was automatically updated to reflect the committed changes.

stelios-arm marked an inline comment as done.

stelios-arm added a commit: rG936c777e2bf8: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR..

nickdesaulniers added subscribers: samitolvanen, nathanchance, eli.friedman, nickdesaulniers.May 4 2021, 12:05 PM

As a heads up, this commit seems to be causing some pretty spooky boot failures for aarch64 Linux kernels built with LTO (non-thin). We're trying to isolate the bug in https://github.com/ClangBuiltLinux/linux/issues/1368, but so far all we know is that modifications to AArch64::STRXpre seem to be solely responsible for the boot failures.

Hello. This is for the 64bit store? Hmm. That might mean that the address and the operands are being re-used. Maybe?

Are the linux builds easy to reproduce? Do you have details anywhere?

Do we have a test case for this example?

early-clobber renamable $x0 = STRXpre killed renamable $x1, killed renamable $x0, 24 :: (store 8)
STRXui killed renamable $x0, renamable $x0, 1 :: (store 8)

@nickdesaulniers This patch is a possible fix for the issue, do you mind testing if it's now OK?
@dmgreen I added one in the patch.

In D99272#2738358, @stelios-arm wrote:

@nickdesaulniers This patch is a possible fix for the issue, do you mind testing if it's now OK?
@dmgreen I added one in the patch.

I tested that patch and it resolves the issue for me, thanks!

@nathanchance Great! Thanks for the update.

We tracked some test failures in Chromium down to this patch (https://crbug.com/1205459) and it appears the fix (D101888) fixed it for us too.

chaosdefinition mentioned this in D152407: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre..Jun 7 2023, 4:18 PM

chaosdefinition mentioned this in rGb0093e13fcfd: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre.Jul 18 2023, 9:47 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

71 lines

AArch64LoadStoreOptimizer.cpp

182 lines

test/

CodeGen/

AArch64/

arm64-memset-inline.ll

8 lines

ldrpre-ldr-merge.mir

133 lines

strpre-str-merge.mir

136 lines

Diff 336390

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 1,963 Lines • ▼ Show 20 Lines

/// Check all MachineMemOperands for a hint that the load/store is strided.		/// Check all MachineMemOperands for a hint that the load/store is strided.
bool AArch64InstrInfo::isStridedAccess(const MachineInstr &MI) {		bool AArch64InstrInfo::isStridedAccess(const MachineInstr &MI) {
return llvm::any_of(MI.memoperands(), [](MachineMemOperand *MMO) {		return llvm::any_of(MI.memoperands(), [](MachineMemOperand *MMO) {
return MMO->getFlags() & MOStridedAccess;		return MMO->getFlags() & MOStridedAccess;
});		});
}		}

bool AArch64InstrInfo::isUnscaledLdSt(unsigned Opc) {		bool AArch64InstrInfo::isUnscaledLdSt(unsigned Opc) {
		dmgreenUnsubmitted Done Reply Inline Actions I feel like "Unscaled" instructions are a set of instructions on their own right. Can we rename the function to something like hasUnscaledLdStOffset to make it clear what it means now? dmgreen: I feel like "Unscaled" instructions are a set of instructions on their own right. Can we rename…
switch (Opc) {		switch (Opc) {
default:		default:
return false;		return false;
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::STURBBi:		case AArch64::STURBBi:
case AArch64::STURHHi:		case AArch64::STURHHi:
case AArch64::STURWi:		case AArch64::STURWi:
		case AArch64::STRWpre:
case AArch64::STURXi:		case AArch64::STURXi:
		case AArch64::STRXpre:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDURQi:		case AArch64::LDURQi:
		case AArch64::LDRQpre:
case AArch64::LDURWi:		case AArch64::LDURWi:
		case AArch64::LDRWpre:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
case AArch64::LDURHHi:		case AArch64::LDURHHi:
case AArch64::LDURBBi:		case AArch64::LDURBBi:
case AArch64::LDURSBWi:		case AArch64::LDURSBWi:
case AArch64::LDURSHWi:		case AArch64::LDURSHWi:
return true;		return true;
}		}
}		}
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
bool AArch64InstrInfo::isPairableLdStInst(const MachineInstr &MI) {		bool AArch64InstrInfo::isPairableLdStInst(const MachineInstr &MI) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
return false;		return false;
// Scaled instructions.		// Scaled instructions.
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STRXui:		case AArch64::STRXui:
		dmgreenUnsubmitted Done Reply Inline Actions We should try and add all the various forms of STR?pre/LDR?pre. Hopefully they all work the same way, with the same operands. dmgreen: We should try and add all the various forms of STR?pre/LDR?pre. Hopefully they all work the…
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDRSWui:		case AArch64::LDRSWui:
// Unscaled instructions.		// Unscaled instructions.
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::STURWi:		case AArch64::STURWi:
		case AArch64::STRWpre:
case AArch64::STURXi:		case AArch64::STURXi:
		case AArch64::STRXpre:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDURQi:		case AArch64::LDURQi:
		case AArch64::LDRQpre:
case AArch64::LDURWi:		case AArch64::LDURWi:
		case AArch64::LDRWpre:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return true;		return true;
}		}
}		}

unsigned AArch64InstrInfo::convertToFlagSettingOpc(unsigned Opc,		unsigned AArch64InstrInfo::convertToFlagSettingOpc(unsigned Opc,
bool &Is64Bit) {		bool &Is64Bit) {
switch (Opc) {		switch (Opc) {
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	case AArch64::SUBXrx:
Is64Bit = true;		Is64Bit = true;
return AArch64::SUBSXrx;		return AArch64::SUBSXrx;
}		}
}		}

// Is this a candidate for ld/st merging or pairing? For example, we don't		// Is this a candidate for ld/st merging or pairing? For example, we don't
// touch volatiles or load/stores that have a hint to avoid pair formation.		// touch volatiles or load/stores that have a hint to avoid pair formation.
bool AArch64InstrInfo::isCandidateToMergeOrPair(const MachineInstr &MI) const {		bool AArch64InstrInfo::isCandidateToMergeOrPair(const MachineInstr &MI) const {
// If this is a volatile load/store, don't mess with it.
if (MI.hasOrderedMemoryRef())		bool IsPreLd = false;
		bool IsPreSt = false;
		stelios-armAuthorUnsubmitted Done Reply Inline Actions For `LDRQpre` , `MI.hasOrderedMemoryRef()`results to true because the instruction has no memory reference information, and conservatively assumes it wasn't preserved. Therefore, I added: && MI.getOpcode() != AArch64::LDRQpre to ignore it for this instruction. I suppose there is a better way of doing it, but I am not yet sure how. stelios-arm: For `LDRQpre` , `MI.hasOrderedMemoryRef()`results to true because the instruction has no memory…
		dmgreenUnsubmitted Not Done Reply Inline Actions Why does the LDRQpre have no memory operand? dmgreen: Why does the LDRQpre have no memory operand?
		stelios-armAuthorUnsubmitted Done Reply Inline Actions I added an explanation in the new revision (Point 5). This is going to be addressed in another patch and this patch will be updated accordingly. stelios-arm: I added an explanation in the new revision (Point 5). This is going to be addressed in another…

		switch (MI.getOpcode()) {
		dmgreenUnsubmitted Done Reply Inline Actions Can these use isPreLd? dmgreen: Can these use isPreLd?
		case AArch64::LDRWpre:
		case AArch64::LDRXpre:
		case AArch64::LDRSpre:
		case AArch64::LDRDpre:
		case AArch64::LDRQpre:
		IsPreLd = true;
		break;
		case AArch64::STRWpre:
		case AArch64::STRXpre:
		case AArch64::STRSpre:
		case AArch64::STRDpre:
		case AArch64::STRQpre:
		IsPreSt = true;
		break;
		}

		// If this is a volatile load/store
		// FIXME: Ignore if the instruction is LDR<S,D,Q,W,X>pre. This is currently
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Just make this dependent on D100215 and modify the code here accordingly. I guess that simply means removing the FIXME and the IsPreLD check. SjoerdMeijer: Just make this dependent on D100215 and modify the code here accordingly. I guess that simply…
		// required because the "::(load x)" memory operand is missing from
		// LDR<S,D,Q,W,X>pre instructions.
		if (MI.hasOrderedMemoryRef() && !IsPreLd)
return false;		return false;

// Make sure this is a reg/fi+imm (as opposed to an address reloc).		// Make sure this is a reg/fi+imm (as opposed to an address reloc).
assert((MI.getOperand(1).isReg() \|\| MI.getOperand(1).isFI()) &&		assert((MI.getOperand(1).isReg() \|\| MI.getOperand(1).isFI()) &&
		dmgreenUnsubmitted Not Done Reply Inline Actions Is this assert valid/sensible for PreInc? dmgreen: Is this assert valid/sensible for PreInc?
		stelios-armAuthorUnsubmitted Done Reply Inline Actions The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register. stelios-arm: The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register.
		dmgreenUnsubmitted Done Reply Inline Actions Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex. With Preinc, that will be shifted one operand. Can we make sure it checks the correct operand for those too? dmgreen: Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex.
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Oh yes, sure! stelios-arm: Oh yes, sure!
"Expected a reg or frame index operand.");		"Expected a reg or frame index operand.");
if (!MI.getOperand(2).isImm())
		// For Pre-indexed addressing quadword instructions, the third operand is the
		// immediate value.
		dmgreenUnsubmitted Done Reply Inline Actions -> operand Formatting looks a bit off, which might or might not be fixed by just running clang-format on the patch. dmgreen: -> operand Formatting looks a bit off, which might or might not be fixed by just running clang…
		bool IsImmPreLdSt = (IsPreLd \|\| IsPreSt) && MI.getOperand(3).isImm();
		dmgreenUnsubmitted Done Reply Inline Actions I think it's probably OK to check assert((MI.getOperand(IsPreLdSt ? 2 : 1).isReg() \|\| MI.getOperand(IsPreLdSt ? 2 : 1).isFI())... dmgreen: I think it's probably OK to check ``` assert((MI.getOperand(IsPreLdSt ? 2 : 1).isReg() \|\|…

		if (!MI.getOperand(2).isImm() && !IsImmPreLdSt)
return false;		return false;

// Can't merge/pair if the instruction modifies the base register.		// Can't merge/pair if the instruction modifies the base register.
// e.g., ldr x0, [x0]		// e.g., ldr x0, [x0]
// This case will never occur with an FI base.		// This case will never occur with an FI base.
		// However, if the instruction is an LDRQpre, it can be merged, e.g.,
		// ldr q0, [x11, #32]!
		// ldr q1, [x11, #16]
		// to ldp q0, q1, [x11, #32]!
if (MI.getOperand(1).isReg()) {		if (MI.getOperand(1).isReg()) {
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Note for `LDRQpre` instructions it should be `MI.getOperand(2).getReg()`, and also `BaseReg` should be `Register BaseReg = MI.getOperand(2).getReg()`. This can be easily fixed, however it won’t do much because `MI.modifiesRegister(BaseReg, TRI)` will again result to true. Any suggestions? stelios-arm: Note for `LDRQpre` instructions it should be `MI.getOperand(2).getReg()`, and also `BaseReg`…
		dmgreenUnsubmitted Done Reply Inline Actions The IsPreLd can move to the outer if, and can be IsPreSt too? The register it is using isn't correct, but it's unpredictable for a writeback load/store to load/store the same register as the operand. So the check is OK to skip for preinc loads/stores. dmgreen: The IsPreLd can move to the outer if, and can be IsPreSt too? The register it is using isn't…
Register BaseReg = MI.getOperand(1).getReg();		Register BaseReg = MI.getOperand(1).getReg();
const TargetRegisterInfo *TRI = &getRegisterInfo();		const TargetRegisterInfo *TRI = &getRegisterInfo();
if (MI.modifiesRegister(BaseReg, TRI))		if (MI.modifiesRegister(BaseReg, TRI) && !IsPreLd)
return false;		return false;
}		}

// Check if this load/store has a hint to avoid pair formation.		// Check if this load/store has a hint to avoid pair formation.
// MachineMemOperands hints are set by the AArch64StorePairSuppress pass.		// MachineMemOperands hints are set by the AArch64StorePairSuppress pass.
if (isLdStPairSuppressed(MI))		if (isLdStPairSuppressed(MI))
return false;		return false;

▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines	int AArch64InstrInfo::getMemScale(unsigned Opc) {
case AArch64::LDURHHi:		case AArch64::LDURHHi:
case AArch64::LDRSHWui:		case AArch64::LDRSHWui:
case AArch64::LDURSHWi:		case AArch64::LDURSHWi:
case AArch64::STRHHui:		case AArch64::STRHHui:
case AArch64::STURHHi:		case AArch64::STURHHi:
return 2;		return 2;
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
case AArch64::LDRSWui:		case AArch64::LDRSWui:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
		case AArch64::LDRWpre:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::STURWi:		case AArch64::STURWi:
		case AArch64::STRWpre:
case AArch64::LDPSi:		case AArch64::LDPSi:
case AArch64::LDPSWi:		case AArch64::LDPSWi:
case AArch64::LDPWi:		case AArch64::LDPWi:
case AArch64::STPSi:		case AArch64::STPSi:
case AArch64::STPWi:		case AArch64::STPWi:
return 4;		return 4;
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STRXui:		case AArch64::STRXui:
case AArch64::STURXi:		case AArch64::STURXi:
		case AArch64::STRXpre:
case AArch64::LDPDi:		case AArch64::LDPDi:
case AArch64::LDPXi:		case AArch64::LDPXi:
case AArch64::STPDi:		case AArch64::STPDi:
case AArch64::STPXi:		case AArch64::STPXi:
return 8;		return 8;
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDURQi:		case AArch64::LDURQi:
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::LDPQi:		case AArch64::LDPQi:
		case AArch64::LDRQpre:
case AArch64::STPQi:		case AArch64::STPQi:
case AArch64::STGOffset:		case AArch64::STGOffset:
case AArch64::STZGOffset:		case AArch64::STZGOffset:
case AArch64::ST2GOffset:		case AArch64::ST2GOffset:
case AArch64::STZ2GOffset:		case AArch64::STZ2GOffset:
case AArch64::STGPi:		case AArch64::STGPi:
return 16;		return 16;
}		}
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	static bool shouldClusterFI(const MachineFrameInfo &MFI, int FI1,
}		}

return FI1 == FI2;		return FI1 == FI2;
}		}

/// Detect opportunities for ldp/stp formation.		/// Detect opportunities for ldp/stp formation.
///		///
/// Only called for LdSt for which getMemOperandWithOffset returns true.		/// Only called for LdSt for which getMemOperandWithOffset returns true.
bool AArch64InstrInfo::shouldClusterMemOps(		bool AArch64InstrInfo::shouldClusterMemOps(
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this get some updates to make it more precise for pre-inc pairs? dmgreen: Should this get some updates to make it more precise for pre-inc pairs?
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset` returns `true`. However, `getMemOperandWithOffset` has a call to `getMemOpInfo`: // If this returns false, then it's an instruction we don't want to handle. if (!getMemOpInfo(LdSt.getOpcode(), Scale, Width, Dummy1, Dummy2)) return false; Currently, the `getMemOpInfo` does not include the pre-inc instructions. Additionally, for the post-inc instructions, there is only one variant available for `STRWpost`/`LDRWpost`. I am not sure why the other variants are not included, so I am bit skeptical If the pre-inc instructions should be added. What do you think? stelios-arm: Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset`…
		dmgreenUnsubmitted Not Done Reply Inline Actions We may want to make it more precise, but for the moment it's probably fine so long as it's not going to get anything drastically wrong. It clusters memory during scheduling to potentially increase the number of combined loads later on. dmgreen: We may want to make it more precise, but for the moment it's probably fine so long as it's not…
		stelios-armAuthorUnsubmitted Done Reply Inline Actions I agree, maybe that's the next step. stelios-arm: I agree, maybe that's the next step.
ArrayRef<const MachineOperand *> BaseOps1,		ArrayRef<const MachineOperand *> BaseOps1,
ArrayRef<const MachineOperand *> BaseOps2, unsigned NumLoads,		ArrayRef<const MachineOperand *> BaseOps2, unsigned NumLoads,
unsigned NumBytes) const {		unsigned NumBytes) const {
assert(BaseOps1.size() == 1 && BaseOps2.size() == 1);		assert(BaseOps1.size() == 1 && BaseOps2.size() == 1);
const MachineOperand &BaseOp1 = *BaseOps1.front();		const MachineOperand &BaseOp1 = *BaseOps1.front();
const MachineOperand &BaseOp2 = *BaseOps2.front();		const MachineOperand &BaseOp2 = *BaseOps2.front();
const MachineInstr &FirstLdSt = *BaseOp1.getParent();		const MachineInstr &FirstLdSt = *BaseOp1.getParent();
const MachineInstr &SecondLdSt = *BaseOp2.getParent();		const MachineInstr &SecondLdSt = *BaseOp2.getParent();
▲ Show 20 Lines • Show All 4,454 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	if (IsValidLdStrOpc)
*IsValidLdStrOpc = true;		*IsValidLdStrOpc = true;
switch (Opc) {		switch (Opc) {
default:		default:
if (IsValidLdStrOpc)		if (IsValidLdStrOpc)
*IsValidLdStrOpc = false;		*IsValidLdStrOpc = false;
return std::numeric_limits<unsigned>::max();		return std::numeric_limits<unsigned>::max();
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::STRBBui:		case AArch64::STRBBui:
case AArch64::STURBBi:		case AArch64::STURBBi:
case AArch64::STRHHui:		case AArch64::STRHHui:
case AArch64::STURHHi:		case AArch64::STURHHi:
case AArch64::STRWui:		case AArch64::STRWui:
		case AArch64::STRWpre:
case AArch64::STURWi:		case AArch64::STURWi:
case AArch64::STRXui:		case AArch64::STRXui:
		case AArch64::STRXpre:
case AArch64::STURXi:		case AArch64::STURXi:
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDURQi:		case AArch64::LDURQi:
		case AArch64::LDRQpre:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
		case AArch64::LDRWpre:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
return Opc;		return Opc;
case AArch64::LDRSWui:		case AArch64::LDRSWui:
return AArch64::LDRWui;		return AArch64::LDRWui;
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return AArch64::LDURWi;		return AArch64::LDURWi;
}		}
}		}

Show All 18 Lines

static unsigned getMatchingPairOpcode(unsigned Opc) {		static unsigned getMatchingPairOpcode(unsigned Opc) {
switch (Opc) {		switch (Opc) {
default:		default:
llvm_unreachable("Opcode has no pairwise equivalent!");		llvm_unreachable("Opcode has no pairwise equivalent!");
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
return AArch64::STPSi;		return AArch64::STPSi;
		case AArch64::STRSpre:
		return AArch64::STPSpre;
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STURDi:		case AArch64::STURDi:
return AArch64::STPDi;		return AArch64::STPDi;
		case AArch64::STRDpre:
		return AArch64::STPDpre;
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STURQi:		case AArch64::STURQi:
return AArch64::STPQi;		return AArch64::STPQi;
		case AArch64::STRQpre:
		return AArch64::STPQpre;
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::STURWi:		case AArch64::STURWi:
return AArch64::STPWi;		return AArch64::STPWi;
		case AArch64::STRWpre:
		return AArch64::STPWpre;
case AArch64::STRXui:		case AArch64::STRXui:
case AArch64::STURXi:		case AArch64::STURXi:
return AArch64::STPXi;		return AArch64::STPXi;
		case AArch64::STRXpre:
		return AArch64::STPXpre;
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
return AArch64::LDPSi;		return AArch64::LDPSi;
		case AArch64::LDRSpre:
		return AArch64::LDPSpre;
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDURDi:		case AArch64::LDURDi:
return AArch64::LDPDi;		return AArch64::LDPDi;
		case AArch64::LDRDpre:
		return AArch64::LDPDpre;
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDURQi:		case AArch64::LDURQi:
return AArch64::LDPQi;		return AArch64::LDPQi;
		case AArch64::LDRQpre:
		return AArch64::LDPQpre;
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
return AArch64::LDPWi;		return AArch64::LDPWi;
		case AArch64::LDRWpre:
		return AArch64::LDPWpre;
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
return AArch64::LDPXi;		return AArch64::LDPXi;
		case AArch64::LDRXpre:
		return AArch64::LDPXpre;
case AArch64::LDRSWui:		case AArch64::LDRSWui:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return AArch64::LDPSWi;		return AArch64::LDPSWi;
}		}
}		}

static unsigned isMatchingStore(MachineInstr &LoadInst,		static unsigned isMatchingStore(MachineInstr &LoadInst,
MachineInstr &StoreInst) {		MachineInstr &StoreInst) {
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	static bool isPairedLdSt(const MachineInstr &MI) {
case AArch64::STPQi:		case AArch64::STPQi:
case AArch64::STPWi:		case AArch64::STPWi:
case AArch64::STPXi:		case AArch64::STPXi:
case AArch64::STGPi:		case AArch64::STGPi:
return true;		return true;
}		}
}		}

		static bool isPairedPreLdSt(unsigned int Opc) {
		dmgreenUnsubmitted Done Reply Inline Actions This is only used in one place, where it could be checking isPreLdSt on the original opcode? dmgreen: This is only used in one place, where it could be checking isPreLdSt on the original opcode?
		switch (Opc) {
		default:
		return false;
		case AArch64::STPSpre:
		case AArch64::STPDpre:
		case AArch64::STPQpre:
		case AArch64::STPWpre:
		case AArch64::STPXpre:
		case AArch64::LDPSpre:
		case AArch64::LDPDpre:
		case AArch64::LDPQpre:
		case AArch64::LDPWpre:
		case AArch64::LDPXpre:
		return true;
		}
		}

		static bool isPreLd(const MachineInstr &MI) {
		switch (MI.getOpcode()) {
		default:
		return false;
		case AArch64::LDRWpre:
		case AArch64::LDRXpre:
		case AArch64::LDRSpre:
		case AArch64::LDRDpre:
		case AArch64::LDRQpre:
		return true;
		}
		}

		static bool isPreSt(const MachineInstr &MI) {
		switch (MI.getOpcode()) {
		default:
		return false;
		case AArch64::STRWpre:
		case AArch64::STRXpre:
		case AArch64::STRSpre:
		case AArch64::STRDpre:
		case AArch64::STRQpre:
		return true;
		}
		}

		static bool isPreLdSt(const MachineInstr &MI) {
		return isPreLd(MI) \|\| isPreSt(MI);
		}

		static bool isPreLdStPairCandidate(MachineInstr &FirstMI, MachineInstr &MI) {

		unsigned OpcA = FirstMI.getOpcode();
		unsigned OpcB = MI.getOpcode();

		switch (OpcA) {
		default:
		return false;
		case AArch64::STRSpre:
		return OpcB == AArch64::STRSui;
		case AArch64::STRDpre:
		return OpcB == AArch64::STRDui;
		case AArch64::STRQpre:
		return OpcB == AArch64::STRQui;
		case AArch64::STRWpre:
		return OpcB == AArch64::STRWui;
		case AArch64::STRXpre:
		return OpcB == AArch64::STRXui;
		case AArch64::LDRSpre:
		return OpcB == AArch64::LDRSui;
		case AArch64::LDRDpre:
		return OpcB == AArch64::LDRDui;
		case AArch64::LDRQpre:
		return OpcB == AArch64::LDRQui;
		case AArch64::LDRWpre:
		return OpcB == AArch64::LDRWui;
		case AArch64::LDRXpre:
		return OpcB == AArch64::LDRXui;
		}
		}

// Returns the scale and offset range of pre/post indexed variants of MI.		// Returns the scale and offset range of pre/post indexed variants of MI.
static void getPrePostIndexedMemOpInfo(const MachineInstr &MI, int &Scale,		static void getPrePostIndexedMemOpInfo(const MachineInstr &MI, int &Scale,
int &MinOffset, int &MaxOffset) {		int &MinOffset, int &MaxOffset) {
bool IsPaired = isPairedLdSt(MI);		bool IsPaired = isPairedLdSt(MI);
bool IsTagStore = isTagStore(MI);		bool IsTagStore = isTagStore(MI);
// ST*G and all paired ldst have the same scale in pre/post-indexed variants		// ST*G and all paired ldst have the same scale in pre/post-indexed variants
// as in the "unsigned offset" variant.		// as in the "unsigned offset" variant.
// All other pre/post indexed ldst instructions are unscaled.		// All other pre/post indexed ldst instructions are unscaled.
Scale = (IsTagStore \|\| IsPaired) ? AArch64InstrInfo::getMemScale(MI) : 1;		Scale = (IsTagStore \|\| IsPaired) ? AArch64InstrInfo::getMemScale(MI) : 1;

if (IsPaired) {		if (IsPaired) {
MinOffset = -64;		MinOffset = -64;
MaxOffset = 63;		MaxOffset = 63;
} else {		} else {
MinOffset = -256;		MinOffset = -256;
MaxOffset = 255;		MaxOffset = 255;
}		}
}		}

static MachineOperand &getLdStRegOp(MachineInstr &MI,		static MachineOperand &getLdStRegOp(MachineInstr &MI,
unsigned PairedRegOp = 0) {		unsigned PairedRegOp = 0) {
assert(PairedRegOp < 2 && "Unexpected register operand idx.");		assert(PairedRegOp < 2 && "Unexpected register operand idx.");
unsigned Idx = isPairedLdSt(MI) ? PairedRegOp : 0;		bool IsPreLdSt = isPreLdSt(MI);
		if (IsPreLdSt)
		PairedRegOp = 1;
		dmgreenUnsubmitted Done Reply Inline Actions += 1? dmgreen: += 1?
		unsigned Idx = isPairedLdSt(MI) \|\| IsPreLdSt ? PairedRegOp : 0;
return MI.getOperand(Idx);		return MI.getOperand(Idx);
}		}

static const MachineOperand &getLdStBaseOp(const MachineInstr &MI) {		static const MachineOperand &getLdStBaseOp(const MachineInstr &MI) {
unsigned Idx = isPairedLdSt(MI) ? 2 : 1;		unsigned Idx = isPairedLdSt(MI) \|\| isPreLdSt(MI) ? 2 : 1;
return MI.getOperand(Idx);		return MI.getOperand(Idx);
}		}

static const MachineOperand &getLdStOffsetOp(const MachineInstr &MI) {		static const MachineOperand &getLdStOffsetOp(const MachineInstr &MI) {
unsigned Idx = isPairedLdSt(MI) ? 3 : 2;		unsigned Idx = isPairedLdSt(MI) \|\| isPreLdSt(MI) ? 3 : 2;
return MI.getOperand(Idx);		return MI.getOperand(Idx);
}		}

static bool isLdOffsetInRangeOfSt(MachineInstr &LoadInst,		static bool isLdOffsetInRangeOfSt(MachineInstr &LoadInst,
MachineInstr &StoreInst,		MachineInstr &StoreInst,
const AArch64InstrInfo *TII) {		const AArch64InstrInfo *TII) {
assert(isMatchingStore(LoadInst, StoreInst) && "Expect only matched ld/st.");		assert(isMatchingStore(LoadInst, StoreInst) && "Expect only matched ld/st.");
int LoadSize = TII->getMemScale(LoadInst);		int LoadSize = TII->getMemScale(LoadInst);
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	if (PairedIsUnscaled) {
"Offset should be a multiple of the stride!");		"Offset should be a multiple of the stride!");
PairedOffset /= MemSize;		PairedOffset /= MemSize;
} else {		} else {
PairedOffset *= MemSize;		PairedOffset *= MemSize;
}		}
}		}

// Which register is Rt and which is Rt2 depends on the offset order.		// Which register is Rt and which is Rt2 depends on the offset order.
		// However, for pre load/stores the Rt should be the one of the pre
		// load/store.
MachineInstr RtMI, Rt2MI;		MachineInstr RtMI, Rt2MI;
if (Offset == PairedOffset + OffsetStride) {		if (Offset == PairedOffset + OffsetStride && !isPreLdSt(*I)) {
RtMI = &*Paired;		RtMI = &*Paired;
Rt2MI = &*I;		Rt2MI = &*I;
// Here we swapped the assumption made for SExtIdx.		// Here we swapped the assumption made for SExtIdx.
// I.e., we turn ldp I, Paired into ldp Paired, I.		// I.e., we turn ldp I, Paired into ldp Paired, I.
// Update the index accordingly.		// Update the index accordingly.
if (SExtIdx != -1)		if (SExtIdx != -1)
SExtIdx = (SExtIdx + 1) % 2;		SExtIdx = (SExtIdx + 1) % 2;
} else {		} else {
Show All 28 Lines	if (!MergeForward) {
// STRWui %w1, ...		// STRWui %w1, ...
// USE kill %w1 ; need to clear kill flag when moving STRWui downwards		// USE kill %w1 ; need to clear kill flag when moving STRWui downwards
// STRW %w0		// STRW %w0
Register Reg = getLdStRegOp(*I).getReg();		Register Reg = getLdStRegOp(*I).getReg();
for (MachineInstr &MI : make_range(std::next(I), Paired))		for (MachineInstr &MI : make_range(std::next(I), Paired))
MI.clearRegisterKills(Reg, TRI);		MI.clearRegisterKills(Reg, TRI);
}		}
}		}
MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingPairOpcode(Opc)))
.add(RegOp0)		unsigned int MatchPairOpcode = getMatchingPairOpcode(Opc);
		MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(MatchPairOpcode));

		// Adds the pre-index operand for pre-indexed ld/st pairs.
		if (isPairedPreLdSt(MatchPairOpcode))
		dmgreenUnsubmitted Done Reply Inline Actions Does this need the second condition in the if? Should that always be true if the first part is a pre? dmgreen: Does this need the second condition in the if? Should that always be true if the first part is…
		stelios-armAuthorUnsubmitted Done Reply Inline Actions If it reaches at this point, the second condition is redundant. Therefore, I will remove it. stelios-arm: If it reaches at this point, the second condition is redundant. Therefore, I will remove it.
		MIB.addReg(BaseRegOp.getReg(), RegState::Define);

		MIB.add(RegOp0)
.add(RegOp1)		.add(RegOp1)
.add(BaseRegOp)		.add(BaseRegOp)
.addImm(OffsetImm)		.addImm(OffsetImm)
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Mis-indentation. I am going to fix this is in an updated revision. stelios-arm: Mis-indentation. I am going to fix this is in an updated revision.
.cloneMergedMemRefs({&I, &Paired})		.cloneMergedMemRefs({&I, &Paired})
.setMIFlags(I->mergeFlagsWith(*Paired));		.setMIFlags(I->mergeFlagsWith(*Paired));

(void)MIB;		(void)MIB;

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Creating pair load/store. Replacing instructions:\n ");		dbgs() << "Creating pair load/store. Replacing instructions:\n ");
LLVM_DEBUG(I->print(dbgs()));		LLVM_DEBUG(I->print(dbgs()));
LLVM_DEBUG(dbgs() << " ");		LLVM_DEBUG(dbgs() << " ");
LLVM_DEBUG(Paired->print(dbgs()));		LLVM_DEBUG(Paired->print(dbgs()));
▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,		static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
LdStPairFlags &Flags,		LdStPairFlags &Flags,
const AArch64InstrInfo *TII) {		const AArch64InstrInfo *TII) {
// If this is volatile or if pairing is suppressed, not a candidate.		// If this is volatile or if pairing is suppressed, not a candidate.
if (MI.hasOrderedMemoryRef() \|\| TII->isLdStPairSuppressed(MI))		if (MI.hasOrderedMemoryRef() \|\| TII->isLdStPairSuppressed(MI))
return false;		return false;

// We should have already checked FirstMI for pair suppression and volatility.		// We should have already checked FirstMI for pair suppression and volatility.
assert(!FirstMI.hasOrderedMemoryRef() &&		// FIXME: Ignore if the instruction is LDR<S,D,Q,W,X>pre. This is currently
!TII->isLdStPairSuppressed(FirstMI) &&		// required because the "::(load x)" memory operand is missing from
		// LDR<S,D,Q,W,X>pre instructions.
		assert(((!FirstMI.hasOrderedMemoryRef() &&
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Ditto. stelios-arm: Ditto.
		!TII->isLdStPairSuppressed(FirstMI)) \|\|
		isPreLd(FirstMI)) &&
"FirstMI shouldn't get here if either of these checks are true.");		"FirstMI shouldn't get here if either of these checks are true.");

unsigned OpcA = FirstMI.getOpcode();		unsigned OpcA = FirstMI.getOpcode();
unsigned OpcB = MI.getOpcode();		unsigned OpcB = MI.getOpcode();

// Opcodes match: nothing more to check.		// Opcodes match: nothing more to check.
if (OpcA == OpcB)		if (OpcA == OpcB)
return true;		return true;
		dmgreenUnsubmitted Done Reply Inline Actions Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to check." dmgreen: Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to…

// Try to match a sign-extended load/store with a zero-extended load/store.		// Try to match a sign-extended load/store with a zero-extended load/store.
bool IsValidLdStrOpc, PairIsValidLdStrOpc;		bool IsValidLdStrOpc, PairIsValidLdStrOpc;
unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);		unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);
assert(IsValidLdStrOpc &&		assert(IsValidLdStrOpc &&
"Given Opc should be a Load or Store with an immediate");		"Given Opc should be a Load or Store with an immediate");
// OpcA will be the first instruction in the pair.		// OpcA will be the first instruction in the pair.
if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {		if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {
Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);		Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);
return true;		return true;
}		}

// If the second instruction isn't even a mergable/pairable load/store, bail		// If the second instruction isn't even a mergable/pairable load/store, bail
// out.		// out.
if (!PairIsValidLdStrOpc)		if (!PairIsValidLdStrOpc)
return false;		return false;

// FIXME: We don't support merging narrow stores with mixed scaled/unscaled		// FIXME: We don't support merging narrow stores with mixed scaled/unscaled
// offsets.		// offsets.
if (isNarrowStore(OpcA) \|\| isNarrowStore(OpcB))		if (isNarrowStore(OpcA) \|\| isNarrowStore(OpcB))
return false;		return false;

		// The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and
		// LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui
		// are candidate pairs that can be merged.
		if (isPreLdStPairCandidate(FirstMI, MI))
		return true;

// Try to match an unscaled load/store with a scaled load/store.		// Try to match an unscaled load/store with a scaled load/store.
return TII->isUnscaledLdSt(OpcA) != TII->isUnscaledLdSt(OpcB) &&		return TII->isUnscaledLdSt(OpcA) != TII->isUnscaledLdSt(OpcB) &&
getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB);		getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB);

// FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?		// FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?
}		}

static bool		static bool
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	if (areCandidatesToMergeOrPair(FirstMI, MI, Flags, TII) &&
continue;		continue;
}		}
MIOffset /= MemSize;		MIOffset /= MemSize;
} else {		} else {
MIOffset *= MemSize;		MIOffset *= MemSize;
}		}
}		}

		// If the offset of the second ld/st is not equal to the size of the
		// destination register it can’t be paired with a pre-index ld/st
		// pair. Additionaly if the base reg is used or modified the operatons
		dmgreenUnsubmitted Done Reply Inline Actions -> Additionally -> operations dmgreen: -> Additionally -> operations
		// can't be paired: bail and keep looking.
		if (isPreLdStPairCandidate(FirstMI, MI)) {
		bool IsOutOfBounds = MIOffset != TII->getMemScale(MI);
		bool IsBaseRegUsed =
		!UsedRegUnits.available(getLdStBaseOp(MI).getReg());
		bool IsBaseRegModified =
		!ModifiedRegUnits.available(getLdStBaseOp(MI).getReg());
		if (IsOutOfBounds \|\| IsBaseRegUsed \|\| IsBaseRegModified) {
		LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
		TRI);
		MemInsns.push_back(&MI);
		continue;
		}
		}

		// The Offset check for isPreLdStPairCandidates is ignored because the
		// only requirement is that the offset of the second ld/st instruction is
		// equal to the size of the destination register, and that is checked
		// above.
if (BaseReg == MIBaseReg && ((Offset == MIOffset + OffsetStride) \|\|		if (BaseReg == MIBaseReg && ((Offset == MIOffset + OffsetStride) \|\|
		dmgreenUnsubmitted Not Done Reply Inline Actions Why does this not already handle the combining of PreLdStPair? The existing code can combine in both directions. Presumably it's only valid for forward now? dmgreen: Why does this not already handle the combining of PreLdStPair? The existing code can combine…
		stelios-armAuthorUnsubmitted Done Reply Inline Actions For example, the following instructions: str w1 [x0, #20]! str w2 [x0, #4] Can be paired to: Stp w1, w2, [x0, #20]! The offset of the first and second instruction is `20` and `4`, respectively. The offset stride is `4`. Therefore, the check `(Offset == MIOffset + OffsetStride)` and `(Offset + OffsetStride == MIOffset)` will return `false`. That’s is why it’s needed. And yes, for such cases it’s only valid for forward now, since the order of the instructions matters for this optimization. stelios-arm: For example, the following instructions: ``` str w1 [x0, #20]! str w2 [x0, #4] ``` Can be…
		dmgreenUnsubmitted Done Reply Inline Actions OK, but it looks like the existing `(Offset == MIOffset + OffsetStride)` conditions could be true for preinc where we don't want them to be. Can we switch it around to be something like: bools IsPreLdSt = isPreLdStPairCandidate(..) if (!IsPreLdSt) { check conditions else continue } else { check pre conditions else continue } That way we don't need the extra indenting, and the conditions don't get muddled together. dmgreen: OK, but it looks like the existing `(Offset == MIOffset + OffsetStride)` conditions could be…
(Offset + OffsetStride == MIOffset))) {		(Offset + OffsetStride == MIOffset) \|\|
		isPreLdStPairCandidate(FirstMI, MI))) {
int MinOffset = Offset < MIOffset ? Offset : MIOffset;		int MinOffset = Offset < MIOffset ? Offset : MIOffset;
if (FindNarrowMerge) {		if (FindNarrowMerge) {
// If the alignment requirements of the scaled wide load/store		// If the alignment requirements of the scaled wide load/store
// instruction can't express the offset of the scaled narrow input,		// instruction can't express the offset of the scaled narrow input,
// bail and keep looking. For promotable zero stores, allow only when		// bail and keep looking. For promotable zero stores, allow only when
// the stored value is the same (i.e., WZR).		// the stored value is the same (i.e., WZR).
if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) \|\|		if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) \|\|
(IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) {		(IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) {
▲ Show 20 Lines • Show All 665 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-memset-inline.ll

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bl something
%buf = alloca [4 x i8], align 1		%buf = alloca [4 x i8], align 1
%cast = bitcast [4 x i8]* %buf to i8*		%cast = bitcast [4 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 4, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 4, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

define void @bzero_8_stack() {		define void @bzero_8_stack() {
; CHECK-LABEL: bzero_8_stack:		; CHECK-LABEL: bzero_8_stack:
stelios-armAuthorUnsubmitted Done Reply Inline Actions In case you are wondering why with the new patch this is changed to and STP here is the full check commands pre-patch: ; CHECK-LABEL: bzero_8_stack: ; CHECK: // %bb.0: ; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: .cfi_offset w30, -16 ; CHECK-NEXT: add x0, sp, #8 // =8 ; CHECK-NEXT: str xzr, [sp, #8] ; CHECK-NEXT: bl something ; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload ; CHECK-NEXT: ret stelios-arm: In case you are wondering why with the new patch this is changed to and STP here is the full…
; CHECK: str xzr, [sp, #8]		; CHECK: stp x30, xzr, [sp, #-16]!
; CHECK-NEXT: bl something		; CHECK: bl something
%buf = alloca [8 x i8], align 1		%buf = alloca [8 x i8], align 1
%cast = bitcast [8 x i8]* %buf to i8*		%cast = bitcast [8 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 8, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 8, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

define void @bzero_12_stack() {		define void @bzero_12_stack() {
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bl something
%buf = alloca [4 x i8], align 1		%buf = alloca [4 x i8], align 1
%cast = bitcast [4 x i8]* %buf to i8*		%cast = bitcast [4 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 4, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 4, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

define void @memset_8_stack() {		define void @memset_8_stack() {
; CHECK-LABEL: memset_8_stack:		; CHECK-LABEL: memset_8_stack:
stelios-armAuthorUnsubmitted Done Reply Inline Actions Similarly: ; CHECK-LABEL: memset_8_stack: ; CHECK: // %bb.0: ; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: .cfi_offset w30, -16 ; CHECK-NEXT: mov x8, #-6148914691236517206 ; CHECK-NEXT: add x0, sp, #8 // =8 ; CHECK-NEXT: str x8, [sp, #8] ; CHECK-NEXT: bl something ; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload ; CHECK-NEXT: ret stelios-arm: Similarly: ``` ; CHECK-LABEL: memset_8_stack: ; CHECK: // %bb.0: ; CHECK-NEXT: str…
; CHECK: mov x8, #-6148914691236517206		; CHECK: mov x8, #-6148914691236517206
; CHECK-NEXT: add x0, sp, #8		; CHECK-NEXT: stp x30, x8, [sp, #-16]!
; CHECK-NEXT: str x8, [sp, #8]		; CHECK-NEXT: add x0, sp, #8 // =8
; CHECK-NEXT: bl something		; CHECK-NEXT: bl something
%buf = alloca [8 x i8], align 1		%buf = alloca [8 x i8], align 1
%cast = bitcast [8 x i8]* %buf to i8*		%cast = bitcast [8 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 8, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 8, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir

This file was added.

				# RUN: llc -o - %s -mtriple=aarch64-none-eabi -mcpu=cortex-a55 -lsr-preferred-addressing-mode=preindexed -stop-after=aarch64-ldst-opt \| FileCheck %s

				---
				name: 1-ldrwpre-ldrwui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$w0' }
				- { reg: '$w1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $w0, $w1, $x1
				; CHECK-LABEL: name: 1-ldrwpre-ldrwui-merge
				; CHECK: liveins: $w0, $w1, $x1
				; CHECK: early-clobber $x1, renamable $w0, renamable $w1 = LDPWpre renamable $x1, 5
				; CHECK: STPWi renamable $w0, renamable $w1, renamable $x1, 0 :: (store 4)
				dmgreenUnsubmitted Done Reply Inline Actions Although I don't think it's an issue with this patch exactly, should this not have two MMO's? Either be :: (load 8) or :: (load 4) (load 4) ? dmgreen: Although I don't think it's an issue with this patch exactly, should this not have two MMO's?
				stelios-armAuthorUnsubmitted Done Reply Inline Actions Yes, it should have "(load 4), (load 4)" instead. stelios-arm: Yes, it should have "(load 4), (load 4)" instead.
				dmgreenUnsubmitted Not Done Reply Inline Actions OK. It's because they are identical, so are elided during the merging of memory operands. It's a little strange to not see both, but fine. You don't get any extra information out of having both in this case. dmgreen: OK. It's because they are identical, so are elided during the merging of memory operands. It's…
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20
				renamable $w1 = LDRWui renamable $x1, 1 :: (load 4)
				STRWui killed renamable $w0, renamable $x1, 0 :: (store 4)
				STRWui killed renamable $w1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 2-ldrxpre-ldrxui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$x2' }
				- { reg: '$x3' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $x2, $x3, $x1
				; CHECK-LABEL: name: 2-ldrxpre-ldrxui-merge
				; CHECK: liveins: $x1, $x2, $x3
				; CHECK: early-clobber $x1, renamable $x2, renamable $x3 = LDPXpre renamable $x1, 3
				; CHECK: STPXi renamable $x2, renamable $x3, renamable $x1, 0 :: (store 8)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $x2 = LDRXpre killed renamable $x1, 24
				renamable $x3 = LDRXui renamable $x1, 1 :: (load 8)
				STRXui killed renamable $x2, renamable $x1, 0 :: (store 8)
				STRXui killed renamable $x3, renamable $x1, 1 :: (store 8)
				RET undef $lr
				...


				---
				name: 3-ldrspre-ldrwui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x1
				; CHECK-LABEL: name: 3-ldrspre-ldrwui-merge
				; CHECK: liveins: $s0, $s1, $x1
				; CHECK: early-clobber $x1, renamable $s0, renamable $s1 = LDPSpre renamable $x1, 3
				; CHECK: STRSui renamable $s0, renamable $x1, 0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 12
				renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...




				---
				name: 4-ldrqdre-ldrdui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$d0' }
				- { reg: '$d1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $d0, $d1, $x1
				; CHECK-LABEL: name: 4-ldrqdre-ldrdui-merge
				; CHECK: liveins: $d0, $d1, $x1
				; CHECK: early-clobber $x1, renamable $d0, renamable $d1 = LDPDpre renamable $x1, 16
				; CHECK: STRDui renamable $d0, renamable $x1, 0 :: (store 8)
				; CHECK: STRDui renamable $d1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 8)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $d0 = LDRDpre killed renamable $x1, 128
				renamable $d1 = LDRDui renamable $x1, 1 :: (load 8)
				STRDui killed renamable $d0, renamable $x1, 0 :: (store 8)
				STRDui killed renamable $d1, renamable $x1, 1 :: (store 8)
				RET undef $lr
				...


				---
				name: 5-ldrqpre-ldrqui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 5-ldrqpre-ldrqui-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber $x1, renamable $q0, renamable $q1 = LDPQpre renamable $x1, 3
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 48
				renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...
				dmgreenUnsubmitted Done Reply Inline Actions Comments can be added here with `;` dmgreen: Comments can be added here with `;`
				dmgreenUnsubmitted Not Done Reply Inline Actions store 8 dmgreen: store 8
				dmgreenUnsubmitted Done Reply Inline Actions According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP. So 1024 is the first multiple of 16 that should be invalid. dmgreen: According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP. So 1024 is the…
				stelios-armAuthorUnsubmitted Done Reply Inline Actions According to this, the offset of LDR<>pre/STR<>pre is in range `[-256,255]`. Assuming that `x∈[-256,255]` and `c = size of <S,D,Q,W,X>` , for this type of optimization the resulted LDP<>pre/STP<>pre will be in range of `[min(x mod c == 0), max(x mod c == 0)]`. stelios-arm: According to [[ https://godbolt.org/z/a65v3vzcT \| this ]], the offset of LDR<>pre/STR<>pre is…

llvm/test/CodeGen/AArch64/strpre-str-merge.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -o - %s -mtriple=aarch64-none-eabi -mcpu=cortex-a55 -lsr-preferred-addressing-mode=preindexed -stop-after=aarch64-ldst-opt \| FileCheck %s

				---
				name: 1-strwpre-strwui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$w1' }
				- { reg: '$w2' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $w1, $w2, $x0
				; CHECK-LABEL: name: 1-strwpre-strwui-merge
				; CHECK: liveins: $w1, $w2, $x0
				; CHECK: early-clobber $x0 = STPWpre renamable $w1, renamable $w2, renamable $x0, 5 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRWpre killed renamable $w1, killed renamable $x0, 20 :: (store 4)
				STRWui killed renamable $w2, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0

				...


				---
				name: 2-strxpre-strxui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				- { reg: '$x2' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $x0, $x1, $x2

				; CHECK-LABEL: name: 2-strxpre-strxui-merge
				; CHECK: liveins: $x0, $x1, $x2
				; CHECK: early-clobber $x0 = STPXpre renamable $x1, renamable $x2, renamable $x0, 3 :: (store 8)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRXpre killed renamable $x1, killed renamable $x0, 24 :: (store 8)
				STRXui killed renamable $x2, renamable $x0, 1 :: (store 8)
				RET undef $lr, implicit $x0

				...


				---
				name: 3-strspre-strsui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 3-strspre-strsui-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber $x0 = STPSpre renamable $s0, renamable $s1, renamable $x0, 3 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 12 :: (store 4)
				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...

				---
				name: 4-strdpre-strdui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$d0' }
				- { reg: '$d1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $d0, $d1, $x0

				; CHECK-LABEL: name: 4-strdpre-strdui-merge
				; CHECK: liveins: $d0, $d1, $x0
				; CHECK: early-clobber $x0 = STPDpre renamable $d0, renamable $d1, renamable $x0, 16 :: (store 8)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRDpre killed renamable $d0, killed renamable $x0, 128 :: (store 8)
				STRDui killed renamable $d1, renamable $x0, 1 :: (store 8)
				RET undef $lr, implicit $x0

				...

				---
				name: 5-strqpre-strqui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $x0

				; CHECK-LABEL: name: 5-strqpre-strqui-merge
				; CHECK: liveins: $q0, $q1, $x0
				; CHECK: early-clobber $x0 = STPQpre renamable $q0, renamable $q1, renamable $x0, 3 :: (store 16)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRQpre killed renamable $q0, killed renamable $x0, 48 :: (store 16)
				STRQui killed renamable $q1, renamable $x0, 1 :: (store 16)
				RET undef $lr, implicit $x0

				...
				dmgreenUnsubmitted Done Reply Inline Actions What happens if this is 16? (Or 4) dmgreen: What happens if this is 16? (Or 4)
				stelios-armAuthorUnsubmitted Done Reply Inline Actions It exhibits the same behaviour. stelios-arm: It exhibits the same behaviour.
				stelios-armAuthorUnsubmitted Done Reply Inline Actions They will not be merged. stelios-arm: They will not be merged.
				dmgreenUnsubmitted Done Reply Inline Actions -257? If the strp has to be aligned, is it worth adding a test for +/-260 too? dmgreen: -257? If the strp has to be aligned, is it worth adding a test for +/-260 too?
				dmgreenUnsubmitted Done Reply Inline Actions store 4 dmgreen: store 4
				dmgreenUnsubmitted Done Reply Inline Actions I tried to come up with a list of tests. You have most of the covered, I also came up with these, some of which might be good to make sure are covered: Given ldrqpre a, [b, c] and ldrqui d, [e, f] q with d load with store a == b? No sure what happens then b != e. That's probably tested naturally anyway. second instruction is ldruqui. There were some others but they sound less useful. dmgreen: I tried to come up with a list of tests. You have most of the covered, I also came up with…

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 336390

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

llvm/test/CodeGen/AArch64/arm64-memset-inline.ll

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir

llvm/test/CodeGen/AArch64/strpre-str-merge.mir

[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.
ClosedPublic