This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64InstrInfo.h
14/19
AArch64InstrInfo.cpp
10/11
AArch64LoadStoreOptimizer.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/2
arm64-memset-inline.ll
5/7
ldrpre-ldr-merge.mir
6/6
strpre-str-merge.mir

Differential D99272

[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.
ClosedPublic

Authored by stelios-arm on Mar 24 2021, 8:50 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
sanwou01
samparker
fhahn
NickGuy

Commits

rG936c777e2bf8: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.

Summary

This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.

For each pair, there is a MIR test that verifies this optimization.

This was a missed opportunity in the AArch64 load/store optimiser for cases like this:

#define float32_t float
#define uint32_t unsigned

void test(float32_t * S, float32_t * D, uint32_t N) {
  for (uint32_t i = 0; i < N; i++) {
    D[i] = D[i] + S[i];
  }
}

When compiled with:

-Ofast -target aarch64-arm-none-eabi -mcpu=cortex-a55 -mllvm -lsr-preferred-addressing-mode=preindexed

It results in:

.LBB0_9:                                // =>This Inner Loop Header: Depth=1
        ldr     q0, [x11, #32]!
        ldr     q1, [x11, #16]
        subs    x12, x12, #8                    // =8
        ldr     q2, [x10, #32]!
        ldr     q3, [x10, #16]
        fadd    v0.4s, v2.4s, v0.4s
        fadd    v1.4s, v3.4s, v1.4s
        stp     q0, q1, [x11]
        b.ne    .LBB0_9

where:

ldr     q0, [x11, #32]!
ldr     q1, [x11, #16]

should be:

ldp	q0, q1, [x11, #32]!

Additionally for cases like:

define <4 x i32>* @strqpre-strqui-merge(<4 x i32>* %p, <4 x i32> %a, <4 x i32> %b) {
entry:
  %p0 = getelementptr <4 x i32>, <4 x i32>* %p, i32 2
  store <4 x i32> %a, <4 x i32>* %p0
  %p1 = getelementptr <4 x i32>, <4 x i32>* %p, i32 3
  store <4 x i32> %b, <4 x i32>* %p1
  ret <4 x i32>* %p0
}

It results in:

"strqpre-strqui-merge":                 // @strqpre-strqui-merge
        str     q0, [x0, #32]!
        str     q1, [x0, #16]
        ret

where the store instruction should be merged with:

stp	q0, q1, [x0, #32]!

This patch covers such cases including the various forms of STR<>pre/LDR<>pre.

Diff Detail

Event Timeline

stelios-arm created this revision.Mar 24 2021, 8:50 AM

Herald added subscribers: danielkiss, arphaman, hiraditya, kristof.beyls. · View Herald TranscriptMar 24 2021, 8:50 AM

stelios-arm requested review of this revision.Mar 24 2021, 8:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 8:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

stelios-arm added inline comments.Mar 24 2021, 9:07 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2386	For `LDRQpre` , `MI.hasOrderedMemoryRef()`results to true because the instruction has no memory reference information, and conservatively assumes it wasn't preserved. Therefore, I added: && MI.getOpcode() != AArch64::LDRQpre to ignore it for this instruction. I suppose there is a better way of doing it, but I am not yet sure how.
2404–2410	Note for `LDRQpre` instructions it should be `MI.getOperand(2).getReg()`, and also `BaseReg` should be `Register BaseReg = MI.getOperand(2).getReg()`. This can be easily fixed, however it won’t do much because `MI.modifiesRegister(BaseReg, TRI)` will again result to true. Any suggestions?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1021	Mis-indentation. I am going to fix this is in an updated revision.
1306	Ditto.

Hi Stelios, many thanks for putting this together, good stuff.
I will do a code-review a bit later, but as there's potential for some corner cases here, first a testing question. Did you do a bootstrap build and e.g. ran the llvm test suite?

Harbormaster completed remote builds in B95506: Diff 333004.Mar 24 2021, 4:17 PM

dmgreen added inline comments.Mar 25 2021, 5:23 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2255	We should try and add all the various forms of STR?pre/LDR?pre. Hopefully they all work the same way, with the same operands.
2386	Why does the LDRQpre have no memory operand?
2390	-> operand Formatting looks a bit off, which might or might not be fixed by just running clang-format on the patch.
llvm/test/CodeGen/AArch64/strqpre-strqui-merge.mir
64 ↗	(On Diff #333004)	This should presumably be the before-STPQpre code, that is then converted to a STPQpre by the pass. You can often remove a lot of the stuff above, like the frameInfo and all the regBankSelected stuff. And there is an update_mir_test_checks for generating check lines.

Added all the various forms of STR<>pre/LDR<>pre.
Added additional test cases for the MIR tests to cover the various forms of STR<>pre/LDR<>pre.
Added constraints so that it optimizes cases where the offset of the second LDR/STR<>ui is equal to the size of the destination register. Additionally, it only optimizes cases where the base register of the pre-index LDR/STRpre<> is not used or modified.
Did a bootstrap build and ran the llvm test-suite on an AArch64 machine. Both the test-suite and regression tests results in no errors.
Currently there is a hack to avoid the memoperands_empty() check for LDR<>pre instructions. This is because they are missing the load memory operand. See below:

early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20

instead it should look something similar to:

early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20 :: (load 4)

This is going to be addressed in another patch, and then this patch will be updated to remove the hack that is in place.

In D99272#2647947, @SjoerdMeijer wrote:

Hi Stelios, many thanks for putting this together, good stuff.
I will do a code-review a bit later, but as there's potential for some corner cases here, first a testing question. Did you do a bootstrap build and e.g. ran the llvm test suite?

This was done for the second revision.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2386	I added an explanation in the new revision (Point 5). This is going to be addressed in another patch and this patch will be updated accordingly.

stelios-arm added inline comments.Apr 9 2021, 4:50 AM

llvm/test/CodeGen/AArch64/arm64-memset-inline.ll

In case you are wondering why with the new patch this is changed to and STP here is the full check commands pre-patch:

; CHECK-LABEL: bzero_8_stack:
; CHECK:       // %bb.0:
; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT:    .cfi_def_cfa_offset 16
; CHECK-NEXT:    .cfi_offset w30, -16
; CHECK-NEXT:    add x0, sp, #8 // =8
; CHECK-NEXT:    str xzr, [sp, #8]
; CHECK-NEXT:    bl something
; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT:    ret

233

Similarly:

; CHECK-LABEL: memset_8_stack:
; CHECK:       // %bb.0:
; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT:    .cfi_def_cfa_offset 16
; CHECK-NEXT:    .cfi_offset w30, -16
; CHECK-NEXT:    mov x8, #-6148914691236517206
; CHECK-NEXT:    add x0, sp, #8 // =8
; CHECK-NEXT:    str x8, [sp, #8]
; CHECK-NEXT:    bl something
; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT:    ret

Harbormaster completed remote builds in B97938: Diff 336390.Apr 9 2021, 5:37 AM

SjoerdMeijer added inline comments.Apr 9 2021, 11:12 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2406	Just make this dependent on D100215 and modify the code here accordingly. I guess that simply means removing the FIXME and the IsPreLD check.

SjoerdMeijer mentioned this in D89693: [AArch64] Favor pre-increments and implement TTI::getPreferredAddressingMode.Apr 12 2021, 2:15 AM

Matt added a subscriber: Matt.Apr 12 2021, 5:25 AM

Removed the hack that was used to avoid the memoperands_empty() check for LDR<>pre instructions.

Harbormaster completed remote builds in B98449: Diff 337087.Apr 13 2021, 3:59 AM

It would be good to see some extra tests for various edge cases, like offsets near to the boundaries and different pairs of instructions being combined/not.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2121	I feel like "Unscaled" instructions are a set of instructions on their own right. Can we rename the function to something like hasUnscaledLdStOffset to make it clear what it means now?
2404–2410	The IsPreLd can move to the outer if, and can be IsPreSt too? The register it is using isn't correct, but it's unpredictable for a writeback load/store to load/store the same register as the operand. So the check is OK to skip for preinc loads/stores.
2984	Should this get some updates to make it more precise for pre-inc pairs?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
572	This is only used in one place, where it could be checking isPreLdSt on the original opcode?
627	+= 1?
1599–1600	Why does this not already handle the combining of PreLdStPair? The existing code can combine in both directions. Presumably it's only valid for forward now?
1602	-> Additionally -> operations

stelios-arm marked 5 inline comments as done.Apr 16 2021, 8:54 AM

stelios-arm added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2984	Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset` returns `true`. However, `getMemOperandWithOffset` has a call to `getMemOpInfo`: // If this returns false, then it's an instruction we don't want to handle. if (!getMemOpInfo(LdSt.getOpcode(), Scale, Width, Dummy1, Dummy2)) return false; Currently, the `getMemOpInfo` does not include the pre-inc instructions. Additionally, for the post-inc instructions, there is only one variant available for `STRWpost`/`LDRWpost`. I am not sure why the other variants are not included, so I am bit skeptical If the pre-inc instructions should be added. What do you think?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1599–1600	For example, the following instructions: str w1 [x0, #20]! str w2 [x0, #4] Can be paired to: Stp w1, w2, [x0, #20]! The offset of the first and second instruction is `20` and `4`, respectively. The offset stride is `4`. Therefore, the check `(Offset == MIOffset + OffsetStride)` and `(Offset + OffsetStride == MIOffset)` will return `false`. That’s is why it’s needed. And yes, for such cases it’s only valid for forward now, since the order of the instructions matters for this optimization.

Added more test cases.
Addressed the remarks.

Harbormaster completed remote builds in B99197: Diff 338139.Apr 16 2021, 10:59 AM

I presume that everything that uses hasUnscaledLdStOffset is still OK? It would either handle pre loads/stores or not reach them as pre loads/stores?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2384	Can these use isPreLd?
2387–2388	Is this assert valid/sensible for PreInc?
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1015	Does this need the second condition in the if? Should that always be true if the first part is a pre?
1599–1600	OK, but it looks like the existing `(Offset == MIOffset + OffsetStride)` conditions could be true for preinc where we don't want them to be. Can we switch it around to be something like: bools IsPreLdSt = isPreLdStPairCandidate(..) if (!IsPreLdSt) { check conditions else continue } else { check pre conditions else continue } That way we don't need the extra indenting, and the conditions don't get muddled together.
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
401	Comments can be added here with `;`
llvm/test/CodeGen/AArch64/strpre-str-merge.mir
268	-257? If the strp has to be aligned, is it worth adding a test for +/-260 too?
329	store 4
358	What happens if this is 16? (Or 4)
390	I tried to come up with a list of tests. You have most of the covered, I also came up with these, some of which might be good to make sure are covered: Given ldrqpre a, [b, c] and ldrqui d, [e, f] q with d load with store a == b? No sure what happens then b != e. That's probably tested naturally anyway. second instruction is ldruqui. There were some others but they sound less useful.

Addressed the comments made by @dmgreen (Thanks for the comments!)

Added more test cases
Refactoring
Added support for LDUR<>i<>Ri/STUR<>i
Changed the code so that the same Pre Ld/St Opcodes are not candidates to merge/pair.

stelios-arm added inline comments.Apr 22 2021, 7:15 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2387–2388	The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register.
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1015	If it reaches at this point, the second condition is redundant. Therefore, I will remove it.
llvm/test/CodeGen/AArch64/strpre-str-merge.mir
358	It exhibits the same behaviour.
358	They will not be merged.

Harbormaster completed remote builds in B100263: Diff 339614.Apr 22 2021, 9:04 AM

dmgreen added inline comments.Apr 26 2021, 4:27 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2387–2388	Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex. With Preinc, that will be shifted one operand. Can we make sure it checks the correct operand for those too?
2984	We may want to make it more precise, but for the moment it's probably fine so long as it's not going to get anything drastically wrong. It clusters memory during scheduling to potentially increase the number of combined loads later on.
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1311	Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to check."
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
447	store 8
616	According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP. So 1024 is the first multiple of 16 that should be invalid.

dmgreen added inline comments.Apr 26 2021, 5:22 AM

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
18	Although I don't think it's an issue with this patch exactly, should this not have two MMO's? Either be :: (load 8) or :: (load 4) (load 4) ?

stelios-arm marked 4 inline comments as done.Apr 30 2021, 4:11 AM

stelios-arm added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2387–2388	Oh yes, sure!
2984	I agree, maybe that's the next step.
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
18	Yes, it should have "(load 4), (load 4)" instead.
616	According to this, the offset of LDR<>pre/STR<>pre is in range `[-256,255]`. Assuming that `x∈[-256,255]` and `c = size of <S,D,Q,W,X>` , for this type of optimization the resulted LDP<>pre/STP<>pre will be in range of `[min(x mod c == 0), max(x mod c == 0)]`.

Addressed the comments made by @dmgreen.

Harbormaster completed remote builds in B101883: Diff 341838.Apr 30 2021, 4:47 AM

Thanks for adding all these tests. LGTM.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2391	I think it's probably OK to check assert((MI.getOperand(IsPreLdSt ? 2 : 1).isReg() \|\| MI.getOperand(IsPreLdSt ? 2 : 1).isFI())...
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
18	OK. It's because they are identical, so are elided during the merging of memory operands. It's a little strange to not see both, but fine. You don't get any extra information out of having both in this case.

This revision is now accepted and ready to land.Apr 30 2021, 6:07 AM

Closed by commit rG936c777e2bf8: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR. (authored by stelios-arm). · Explain WhyApr 30 2021, 9:30 AM

This revision was automatically updated to reflect the committed changes.

stelios-arm marked an inline comment as done.

stelios-arm added a commit: rG936c777e2bf8: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR..

nickdesaulniers added subscribers: samitolvanen, nathanchance, eli.friedman, nickdesaulniers.May 4 2021, 12:05 PM

As a heads up, this commit seems to be causing some pretty spooky boot failures for aarch64 Linux kernels built with LTO (non-thin). We're trying to isolate the bug in https://github.com/ClangBuiltLinux/linux/issues/1368, but so far all we know is that modifications to AArch64::STRXpre seem to be solely responsible for the boot failures.

Hello. This is for the 64bit store? Hmm. That might mean that the address and the operands are being re-used. Maybe?

Are the linux builds easy to reproduce? Do you have details anywhere?

Do we have a test case for this example?

early-clobber renamable $x0 = STRXpre killed renamable $x1, killed renamable $x0, 24 :: (store 8)
STRXui killed renamable $x0, renamable $x0, 1 :: (store 8)

@nickdesaulniers This patch is a possible fix for the issue, do you mind testing if it's now OK?
@dmgreen I added one in the patch.

In D99272#2738358, @stelios-arm wrote:

@nickdesaulniers This patch is a possible fix for the issue, do you mind testing if it's now OK?
@dmgreen I added one in the patch.

I tested that patch and it resolves the issue for me, thanks!

@nathanchance Great! Thanks for the update.

We tracked some test failures in Chromium down to this patch (https://crbug.com/1205459) and it appears the fix (D101888) fixed it for us too.

chaosdefinition mentioned this in D152407: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre..Jun 7 2023, 4:18 PM

chaosdefinition mentioned this in rGb0093e13fcfd: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre.Jul 18 2023, 9:47 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.h

16 lines

AArch64InstrInfo.cpp

91 lines

AArch64LoadStoreOptimizer.cpp

165 lines

test/

CodeGen/

AArch64/

arm64-memset-inline.ll

8 lines

ldrpre-ldr-merge.mir

587 lines

strpre-str-merge.mir

426 lines

Diff 341838

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	public:

/// Return true if pairing the given load or store is hinted to be		/// Return true if pairing the given load or store is hinted to be
/// unprofitable.		/// unprofitable.
static bool isLdStPairSuppressed(const MachineInstr &MI);		static bool isLdStPairSuppressed(const MachineInstr &MI);

/// Return true if the given load or store is a strided memory access.		/// Return true if the given load or store is a strided memory access.
static bool isStridedAccess(const MachineInstr &MI);		static bool isStridedAccess(const MachineInstr &MI);

/// Return true if this is an unscaled load/store.		/// Return true if it has an unscaled load/store offset.
static bool isUnscaledLdSt(unsigned Opc);		static bool hasUnscaledLdStOffset(unsigned Opc);
static bool isUnscaledLdSt(MachineInstr &MI) {		static bool hasUnscaledLdStOffset(MachineInstr &MI) {
return isUnscaledLdSt(MI.getOpcode());		return hasUnscaledLdStOffset(MI.getOpcode());
}		}

/// Returns the unscaled load/store for the scaled load/store opcode,		/// Returns the unscaled load/store for the scaled load/store opcode,
/// if there is a corresponding unscaled variant available.		/// if there is a corresponding unscaled variant available.
static Optional<unsigned> getUnscaledLdSt(unsigned Opc);		static Optional<unsigned> getUnscaledLdSt(unsigned Opc);

/// Scaling factor for (scaled or unscaled) load or store.		/// Scaling factor for (scaled or unscaled) load or store.
static int getMemScale(unsigned Opc);		static int getMemScale(unsigned Opc);
static int getMemScale(const MachineInstr &MI) {		static int getMemScale(const MachineInstr &MI) {
return getMemScale(MI.getOpcode());		return getMemScale(MI.getOpcode());
}		}

		/// Returns whether the instruction is a pre-indexed load.
		static bool isPreLd(const MachineInstr &MI);

		/// Returns whether the instruction is a pre-indexed store.
		static bool isPreSt(const MachineInstr &MI);

		/// Returns whether the instruction is a pre-indexed load/store.
		static bool isPreLdSt(const MachineInstr &MI);

/// Returns the index for the immediate for a given instruction.		/// Returns the index for the immediate for a given instruction.
static unsigned getLoadStoreImmIdx(unsigned Opc);		static unsigned getLoadStoreImmIdx(unsigned Opc);

/// Return true if pairing the given load or store may be paired with another.		/// Return true if pairing the given load or store may be paired with another.
static bool isPairableLdStInst(const MachineInstr &MI);		static bool isPairableLdStInst(const MachineInstr &MI);

/// Return the opcode that set flags when possible. The caller is		/// Return the opcode that set flags when possible. The caller is
▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,112 Lines • ▼ Show 20 Lines

/// Check all MachineMemOperands for a hint that the load/store is strided.		/// Check all MachineMemOperands for a hint that the load/store is strided.
bool AArch64InstrInfo::isStridedAccess(const MachineInstr &MI) {		bool AArch64InstrInfo::isStridedAccess(const MachineInstr &MI) {
return llvm::any_of(MI.memoperands(), [](MachineMemOperand *MMO) {		return llvm::any_of(MI.memoperands(), [](MachineMemOperand *MMO) {
return MMO->getFlags() & MOStridedAccess;		return MMO->getFlags() & MOStridedAccess;
});		});
}		}

bool AArch64InstrInfo::isUnscaledLdSt(unsigned Opc) {		bool AArch64InstrInfo::hasUnscaledLdStOffset(unsigned Opc) {
		dmgreenUnsubmitted Done Reply Inline Actions I feel like "Unscaled" instructions are a set of instructions on their own right. Can we rename the function to something like hasUnscaledLdStOffset to make it clear what it means now? dmgreen: I feel like "Unscaled" instructions are a set of instructions on their own right. Can we rename…
switch (Opc) {		switch (Opc) {
default:		default:
return false;		return false;
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::STURBBi:		case AArch64::STURBBi:
case AArch64::STURHHi:		case AArch64::STURHHi:
case AArch64::STURWi:		case AArch64::STURWi:
		case AArch64::STRWpre:
case AArch64::STURXi:		case AArch64::STURXi:
		case AArch64::STRXpre:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDURQi:		case AArch64::LDURQi:
		case AArch64::LDRQpre:
case AArch64::LDURWi:		case AArch64::LDURWi:
		case AArch64::LDRWpre:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
case AArch64::LDURHHi:		case AArch64::LDURHHi:
case AArch64::LDURBBi:		case AArch64::LDURBBi:
case AArch64::LDURSBWi:		case AArch64::LDURSBWi:
case AArch64::LDURSHWi:		case AArch64::LDURSHWi:
return true;		return true;
}		}
}		}
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
bool AArch64InstrInfo::isPairableLdStInst(const MachineInstr &MI) {		bool AArch64InstrInfo::isPairableLdStInst(const MachineInstr &MI) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
return false;		return false;
// Scaled instructions.		// Scaled instructions.
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STRXui:		case AArch64::STRXui:
		dmgreenUnsubmitted Done Reply Inline Actions We should try and add all the various forms of STR?pre/LDR?pre. Hopefully they all work the same way, with the same operands. dmgreen: We should try and add all the various forms of STR?pre/LDR?pre. Hopefully they all work the…
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDRSWui:		case AArch64::LDRSWui:
// Unscaled instructions.		// Unscaled instructions.
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::STURWi:		case AArch64::STURWi:
		case AArch64::STRWpre:
case AArch64::STURXi:		case AArch64::STURXi:
		case AArch64::STRXpre:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDURQi:		case AArch64::LDURQi:
		case AArch64::LDRQpre:
case AArch64::LDURWi:		case AArch64::LDURWi:
		case AArch64::LDRWpre:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return true;		return true;
}		}
}		}

unsigned AArch64InstrInfo::convertToFlagSettingOpc(unsigned Opc,		unsigned AArch64InstrInfo::convertToFlagSettingOpc(unsigned Opc,
bool &Is64Bit) {		bool &Is64Bit) {
switch (Opc) {		switch (Opc) {
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	case AArch64::SUBXrx:
Is64Bit = true;		Is64Bit = true;
return AArch64::SUBSXrx;		return AArch64::SUBSXrx;
}		}
}		}

// Is this a candidate for ld/st merging or pairing? For example, we don't		// Is this a candidate for ld/st merging or pairing? For example, we don't
// touch volatiles or load/stores that have a hint to avoid pair formation.		// touch volatiles or load/stores that have a hint to avoid pair formation.
bool AArch64InstrInfo::isCandidateToMergeOrPair(const MachineInstr &MI) const {		bool AArch64InstrInfo::isCandidateToMergeOrPair(const MachineInstr &MI) const {

		bool IsPreLdSt = isPreLdSt(MI);

// If this is a volatile load/store, don't mess with it.		// If this is a volatile load/store, don't mess with it.
if (MI.hasOrderedMemoryRef())		if (MI.hasOrderedMemoryRef())
		dmgreenUnsubmitted Done Reply Inline Actions Can these use isPreLd? dmgreen: Can these use isPreLd?
return false;		return false;

		stelios-armAuthorUnsubmitted Done Reply Inline Actions For `LDRQpre` , `MI.hasOrderedMemoryRef()`results to true because the instruction has no memory reference information, and conservatively assumes it wasn't preserved. Therefore, I added: && MI.getOpcode() != AArch64::LDRQpre to ignore it for this instruction. I suppose there is a better way of doing it, but I am not yet sure how. stelios-arm: For `LDRQpre` , `MI.hasOrderedMemoryRef()`results to true because the instruction has no memory…
		dmgreenUnsubmitted Not Done Reply Inline Actions Why does the LDRQpre have no memory operand? dmgreen: Why does the LDRQpre have no memory operand?
		stelios-armAuthorUnsubmitted Done Reply Inline Actions I added an explanation in the new revision (Point 5). This is going to be addressed in another patch and this patch will be updated accordingly. stelios-arm: I added an explanation in the new revision (Point 5). This is going to be addressed in another…
// Make sure this is a reg/fi+imm (as opposed to an address reloc).		// Make sure this is a reg/fi+imm (as opposed to an address reloc).
assert((MI.getOperand(1).isReg() \|\| MI.getOperand(1).isFI()) &&		// For Pre-inc LD/ST, the operand is shifted by one.
		dmgreenUnsubmitted Not Done Reply Inline Actions Is this assert valid/sensible for PreInc? dmgreen: Is this assert valid/sensible for PreInc?
		stelios-armAuthorUnsubmitted Done Reply Inline Actions The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register. stelios-arm: The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register.
		dmgreenUnsubmitted Done Reply Inline Actions Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex. With Preinc, that will be shifted one operand. Can we make sure it checks the correct operand for those too? dmgreen: Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex.
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Oh yes, sure! stelios-arm: Oh yes, sure!
		assert(
		(((MI.getOperand(1).isReg() \|\| MI.getOperand(1).isFI()) && !IsPreLdSt) \|\|
		dmgreenUnsubmitted Done Reply Inline Actions -> operand Formatting looks a bit off, which might or might not be fixed by just running clang-format on the patch. dmgreen: -> operand Formatting looks a bit off, which might or might not be fixed by just running clang…
		((MI.getOperand(2).isReg() \|\| MI.getOperand(2).isFI()) && IsPreLdSt)) &&
		dmgreenUnsubmitted Done Reply Inline Actions I think it's probably OK to check assert((MI.getOperand(IsPreLdSt ? 2 : 1).isReg() \|\| MI.getOperand(IsPreLdSt ? 2 : 1).isFI())... dmgreen: I think it's probably OK to check ``` assert((MI.getOperand(IsPreLdSt ? 2 : 1).isReg() \|\|…
"Expected a reg or frame index operand.");		"Expected a reg or frame index operand.");
if (!MI.getOperand(2).isImm())
		// For Pre-indexed addressing quadword instructions, the third operand is the
		// immediate value.
		bool IsImmPreLdSt = IsPreLdSt && MI.getOperand(3).isImm();

		if (!MI.getOperand(2).isImm() && !IsImmPreLdSt)
return false;		return false;

// Can't merge/pair if the instruction modifies the base register.		// Can't merge/pair if the instruction modifies the base register.
// e.g., ldr x0, [x0]		// e.g., ldr x0, [x0]
// This case will never occur with an FI base.		// This case will never occur with an FI base.
if (MI.getOperand(1).isReg()) {		// However, if the instruction is an LDR/STR<S,D,Q,W,X>pre, it can be merged.
		// For example:
		// ldr q0, [x11, #32]!
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Just make this dependent on D100215 and modify the code here accordingly. I guess that simply means removing the FIXME and the IsPreLD check. SjoerdMeijer: Just make this dependent on D100215 and modify the code here accordingly. I guess that simply…
		// ldr q1, [x11, #16]
		// to
		// ldp q0, q1, [x11, #32]!
		if (MI.getOperand(1).isReg() && !IsPreLdSt) {
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Note for `LDRQpre` instructions it should be `MI.getOperand(2).getReg()`, and also `BaseReg` should be `Register BaseReg = MI.getOperand(2).getReg()`. This can be easily fixed, however it won’t do much because `MI.modifiesRegister(BaseReg, TRI)` will again result to true. Any suggestions? stelios-arm: Note for `LDRQpre` instructions it should be `MI.getOperand(2).getReg()`, and also `BaseReg`…
		dmgreenUnsubmitted Done Reply Inline Actions The IsPreLd can move to the outer if, and can be IsPreSt too? The register it is using isn't correct, but it's unpredictable for a writeback load/store to load/store the same register as the operand. So the check is OK to skip for preinc loads/stores. dmgreen: The IsPreLd can move to the outer if, and can be IsPreSt too? The register it is using isn't…
Register BaseReg = MI.getOperand(1).getReg();		Register BaseReg = MI.getOperand(1).getReg();
const TargetRegisterInfo *TRI = &getRegisterInfo();		const TargetRegisterInfo *TRI = &getRegisterInfo();
if (MI.modifiesRegister(BaseReg, TRI))		if (MI.modifiesRegister(BaseReg, TRI))
return false;		return false;
}		}

// Check if this load/store has a hint to avoid pair formation.		// Check if this load/store has a hint to avoid pair formation.
// MachineMemOperands hints are set by the AArch64StorePairSuppress pass.		// MachineMemOperands hints are set by the AArch64StorePairSuppress pass.
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	int AArch64InstrInfo::getMemScale(unsigned Opc) {
case AArch64::LDURHHi:		case AArch64::LDURHHi:
case AArch64::LDRSHWui:		case AArch64::LDRSHWui:
case AArch64::LDURSHWi:		case AArch64::LDURSHWi:
case AArch64::STRHHui:		case AArch64::STRHHui:
case AArch64::STURHHi:		case AArch64::STURHHi:
return 2;		return 2;
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
case AArch64::LDRSWui:		case AArch64::LDRSWui:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
		case AArch64::LDRWpre:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::STURWi:		case AArch64::STURWi:
		case AArch64::STRWpre:
case AArch64::LDPSi:		case AArch64::LDPSi:
case AArch64::LDPSWi:		case AArch64::LDPSWi:
case AArch64::LDPWi:		case AArch64::LDPWi:
case AArch64::STPSi:		case AArch64::STPSi:
case AArch64::STPWi:		case AArch64::STPWi:
return 4;		return 4;
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STRXui:		case AArch64::STRXui:
case AArch64::STURXi:		case AArch64::STURXi:
		case AArch64::STRXpre:
case AArch64::LDPDi:		case AArch64::LDPDi:
case AArch64::LDPXi:		case AArch64::LDPXi:
case AArch64::STPDi:		case AArch64::STPDi:
case AArch64::STPXi:		case AArch64::STPXi:
return 8;		return 8;
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDURQi:		case AArch64::LDURQi:
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::LDPQi:		case AArch64::LDPQi:
		case AArch64::LDRQpre:
case AArch64::STPQi:		case AArch64::STPQi:
case AArch64::STGOffset:		case AArch64::STGOffset:
case AArch64::STZGOffset:		case AArch64::STZGOffset:
case AArch64::ST2GOffset:		case AArch64::ST2GOffset:
case AArch64::STZ2GOffset:		case AArch64::STZ2GOffset:
case AArch64::STGPi:		case AArch64::STGPi:
return 16;		return 16;
}		}
}		}

		bool AArch64InstrInfo::isPreLd(const MachineInstr &MI) {
		switch (MI.getOpcode()) {
		default:
		return false;
		case AArch64::LDRWpre:
		case AArch64::LDRXpre:
		case AArch64::LDRSpre:
		case AArch64::LDRDpre:
		case AArch64::LDRQpre:
		return true;
		}
		}

		bool AArch64InstrInfo::isPreSt(const MachineInstr &MI) {
		switch (MI.getOpcode()) {
		default:
		return false;
		case AArch64::STRWpre:
		case AArch64::STRXpre:
		case AArch64::STRSpre:
		case AArch64::STRDpre:
		case AArch64::STRQpre:
		return true;
		}
		}

		bool AArch64InstrInfo::isPreLdSt(const MachineInstr &MI) {
		return isPreLd(MI) \|\| isPreSt(MI);
		}

// Scale the unscaled offsets. Returns false if the unscaled offset can't be		// Scale the unscaled offsets. Returns false if the unscaled offset can't be
// scaled.		// scaled.
static bool scaleOffset(unsigned Opc, int64_t &Offset) {		static bool scaleOffset(unsigned Opc, int64_t &Offset) {
int Scale = AArch64InstrInfo::getMemScale(Opc);		int Scale = AArch64InstrInfo::getMemScale(Opc);

// If the byte-offset isn't a multiple of the stride, we can't scale this		// If the byte-offset isn't a multiple of the stride, we can't scale this
// offset.		// offset.
if (Offset % Scale != 0)		if (Offset % Scale != 0)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	static bool shouldClusterFI(const MachineFrameInfo &MFI, int FI1,
}		}

return FI1 == FI2;		return FI1 == FI2;
}		}

/// Detect opportunities for ldp/stp formation.		/// Detect opportunities for ldp/stp formation.
///		///
/// Only called for LdSt for which getMemOperandWithOffset returns true.		/// Only called for LdSt for which getMemOperandWithOffset returns true.
bool AArch64InstrInfo::shouldClusterMemOps(		bool AArch64InstrInfo::shouldClusterMemOps(
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this get some updates to make it more precise for pre-inc pairs? dmgreen: Should this get some updates to make it more precise for pre-inc pairs?
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset` returns `true`. However, `getMemOperandWithOffset` has a call to `getMemOpInfo`: // If this returns false, then it's an instruction we don't want to handle. if (!getMemOpInfo(LdSt.getOpcode(), Scale, Width, Dummy1, Dummy2)) return false; Currently, the `getMemOpInfo` does not include the pre-inc instructions. Additionally, for the post-inc instructions, there is only one variant available for `STRWpost`/`LDRWpost`. I am not sure why the other variants are not included, so I am bit skeptical If the pre-inc instructions should be added. What do you think? stelios-arm: Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset`…
		dmgreenUnsubmitted Not Done Reply Inline Actions We may want to make it more precise, but for the moment it's probably fine so long as it's not going to get anything drastically wrong. It clusters memory during scheduling to potentially increase the number of combined loads later on. dmgreen: We may want to make it more precise, but for the moment it's probably fine so long as it's not…
		stelios-armAuthorUnsubmitted Done Reply Inline Actions I agree, maybe that's the next step. stelios-arm: I agree, maybe that's the next step.
ArrayRef<const MachineOperand *> BaseOps1,		ArrayRef<const MachineOperand *> BaseOps1,
ArrayRef<const MachineOperand *> BaseOps2, unsigned NumLoads,		ArrayRef<const MachineOperand *> BaseOps2, unsigned NumLoads,
unsigned NumBytes) const {		unsigned NumBytes) const {
assert(BaseOps1.size() == 1 && BaseOps2.size() == 1);		assert(BaseOps1.size() == 1 && BaseOps2.size() == 1);
const MachineOperand &BaseOp1 = *BaseOps1.front();		const MachineOperand &BaseOp1 = *BaseOps1.front();
const MachineOperand &BaseOp2 = *BaseOps2.front();		const MachineOperand &BaseOp2 = *BaseOps2.front();
const MachineInstr &FirstLdSt = *BaseOp1.getParent();		const MachineInstr &FirstLdSt = *BaseOp1.getParent();
const MachineInstr &SecondLdSt = *BaseOp2.getParent();		const MachineInstr &SecondLdSt = *BaseOp2.getParent();
Show All 23 Lines	bool AArch64InstrInfo::shouldClusterMemOps(
// Can't merge volatiles or load/stores that have a hint to avoid pair		// Can't merge volatiles or load/stores that have a hint to avoid pair
// formation, for example.		// formation, for example.
if (!isCandidateToMergeOrPair(FirstLdSt) \|\|		if (!isCandidateToMergeOrPair(FirstLdSt) \|\|
!isCandidateToMergeOrPair(SecondLdSt))		!isCandidateToMergeOrPair(SecondLdSt))
return false;		return false;

// isCandidateToMergeOrPair guarantees that operand 2 is an immediate.		// isCandidateToMergeOrPair guarantees that operand 2 is an immediate.
int64_t Offset1 = FirstLdSt.getOperand(2).getImm();		int64_t Offset1 = FirstLdSt.getOperand(2).getImm();
if (isUnscaledLdSt(FirstOpc) && !scaleOffset(FirstOpc, Offset1))		if (hasUnscaledLdStOffset(FirstOpc) && !scaleOffset(FirstOpc, Offset1))
return false;		return false;

int64_t Offset2 = SecondLdSt.getOperand(2).getImm();		int64_t Offset2 = SecondLdSt.getOperand(2).getImm();
if (isUnscaledLdSt(SecondOpc) && !scaleOffset(SecondOpc, Offset2))		if (hasUnscaledLdStOffset(SecondOpc) && !scaleOffset(SecondOpc, Offset2))
return false;		return false;

// Pairwise instructions have a 7-bit signed offset field.		// Pairwise instructions have a 7-bit signed offset field.
if (Offset1 > 63 \|\| Offset1 < -64)		if (Offset1 > 63 \|\| Offset1 < -64)
return false;		return false;

// The caller should already have ordered First/SecondLdSt by offset.		// The caller should already have ordered First/SecondLdSt by offset.
// Note: except for non-equal frame index bases		// Note: except for non-equal frame index bases
▲ Show 20 Lines • Show All 4,410 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	if (IsValidLdStrOpc)
*IsValidLdStrOpc = true;		*IsValidLdStrOpc = true;
switch (Opc) {		switch (Opc) {
default:		default:
if (IsValidLdStrOpc)		if (IsValidLdStrOpc)
*IsValidLdStrOpc = false;		*IsValidLdStrOpc = false;
return std::numeric_limits<unsigned>::max();		return std::numeric_limits<unsigned>::max();
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STURDi:		case AArch64::STURDi:
		case AArch64::STRDpre:
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STURQi:		case AArch64::STURQi:
		case AArch64::STRQpre:
case AArch64::STRBBui:		case AArch64::STRBBui:
case AArch64::STURBBi:		case AArch64::STURBBi:
case AArch64::STRHHui:		case AArch64::STRHHui:
case AArch64::STURHHi:		case AArch64::STURHHi:
case AArch64::STRWui:		case AArch64::STRWui:
		case AArch64::STRWpre:
case AArch64::STURWi:		case AArch64::STURWi:
case AArch64::STRXui:		case AArch64::STRXui:
		case AArch64::STRXpre:
case AArch64::STURXi:		case AArch64::STURXi:
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDURDi:		case AArch64::LDURDi:
		case AArch64::LDRDpre:
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDURQi:		case AArch64::LDURQi:
		case AArch64::LDRQpre:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
		case AArch64::LDRWpre:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
		case AArch64::LDRXpre:
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
		case AArch64::STRSpre:
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
		case AArch64::LDRSpre:
return Opc;		return Opc;
case AArch64::LDRSWui:		case AArch64::LDRSWui:
return AArch64::LDRWui;		return AArch64::LDRWui;
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return AArch64::LDURWi;		return AArch64::LDURWi;
}		}
}		}

Show All 18 Lines

static unsigned getMatchingPairOpcode(unsigned Opc) {		static unsigned getMatchingPairOpcode(unsigned Opc) {
switch (Opc) {		switch (Opc) {
default:		default:
llvm_unreachable("Opcode has no pairwise equivalent!");		llvm_unreachable("Opcode has no pairwise equivalent!");
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
return AArch64::STPSi;		return AArch64::STPSi;
		case AArch64::STRSpre:
		return AArch64::STPSpre;
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::STURDi:		case AArch64::STURDi:
return AArch64::STPDi;		return AArch64::STPDi;
		case AArch64::STRDpre:
		return AArch64::STPDpre;
case AArch64::STRQui:		case AArch64::STRQui:
case AArch64::STURQi:		case AArch64::STURQi:
return AArch64::STPQi;		return AArch64::STPQi;
		case AArch64::STRQpre:
		return AArch64::STPQpre;
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::STURWi:		case AArch64::STURWi:
return AArch64::STPWi;		return AArch64::STPWi;
		case AArch64::STRWpre:
		return AArch64::STPWpre;
case AArch64::STRXui:		case AArch64::STRXui:
case AArch64::STURXi:		case AArch64::STURXi:
return AArch64::STPXi;		return AArch64::STPXi;
		case AArch64::STRXpre:
		return AArch64::STPXpre;
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
return AArch64::LDPSi;		return AArch64::LDPSi;
		case AArch64::LDRSpre:
		return AArch64::LDPSpre;
case AArch64::LDRDui:		case AArch64::LDRDui:
case AArch64::LDURDi:		case AArch64::LDURDi:
return AArch64::LDPDi;		return AArch64::LDPDi;
		case AArch64::LDRDpre:
		return AArch64::LDPDpre;
case AArch64::LDRQui:		case AArch64::LDRQui:
case AArch64::LDURQi:		case AArch64::LDURQi:
return AArch64::LDPQi;		return AArch64::LDPQi;
		case AArch64::LDRQpre:
		return AArch64::LDPQpre;
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
return AArch64::LDPWi;		return AArch64::LDPWi;
		case AArch64::LDRWpre:
		return AArch64::LDPWpre;
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
return AArch64::LDPXi;		return AArch64::LDPXi;
		case AArch64::LDRXpre:
		return AArch64::LDPXpre;
case AArch64::LDRSWui:		case AArch64::LDRSWui:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return AArch64::LDPSWi;		return AArch64::LDPSWi;
}		}
}		}

static unsigned isMatchingStore(MachineInstr &LoadInst,		static unsigned isMatchingStore(MachineInstr &LoadInst,
MachineInstr &StoreInst) {		MachineInstr &StoreInst) {
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	static bool isPairedLdSt(const MachineInstr &MI) {
case AArch64::STPQi:		case AArch64::STPQi:
case AArch64::STPWi:		case AArch64::STPWi:
case AArch64::STPXi:		case AArch64::STPXi:
case AArch64::STGPi:		case AArch64::STGPi:
return true;		return true;
}		}
}		}

		static bool isPreLdStPairCandidate(MachineInstr &FirstMI, MachineInstr &MI) {
		dmgreenUnsubmitted Done Reply Inline Actions This is only used in one place, where it could be checking isPreLdSt on the original opcode? dmgreen: This is only used in one place, where it could be checking isPreLdSt on the original opcode?

		unsigned OpcA = FirstMI.getOpcode();
		unsigned OpcB = MI.getOpcode();

		switch (OpcA) {
		default:
		return false;
		case AArch64::STRSpre:
		return (OpcB == AArch64::STRSui) \|\| (OpcB == AArch64::STURSi);
		case AArch64::STRDpre:
		return (OpcB == AArch64::STRDui) \|\| (OpcB == AArch64::STURDi);
		case AArch64::STRQpre:
		return (OpcB == AArch64::STRQui) \|\| (OpcB == AArch64::STURQi);
		case AArch64::STRWpre:
		return (OpcB == AArch64::STRWui) \|\| (OpcB == AArch64::STURWi);
		case AArch64::STRXpre:
		return (OpcB == AArch64::STRXui) \|\| (OpcB == AArch64::STURXi);
		case AArch64::LDRSpre:
		return (OpcB == AArch64::LDRSui) \|\| (OpcB == AArch64::LDURSi);
		case AArch64::LDRDpre:
		return (OpcB == AArch64::LDRDui) \|\| (OpcB == AArch64::LDURDi);
		case AArch64::LDRQpre:
		return (OpcB == AArch64::LDRQui) \|\| (OpcB == AArch64::LDURQi);
		case AArch64::LDRWpre:
		return (OpcB == AArch64::LDRWui) \|\| (OpcB == AArch64::LDURWi);
		case AArch64::LDRXpre:
		return (OpcB == AArch64::LDRXui) \|\| (OpcB == AArch64::LDURXi);
		}
		}

// Returns the scale and offset range of pre/post indexed variants of MI.		// Returns the scale and offset range of pre/post indexed variants of MI.
static void getPrePostIndexedMemOpInfo(const MachineInstr &MI, int &Scale,		static void getPrePostIndexedMemOpInfo(const MachineInstr &MI, int &Scale,
int &MinOffset, int &MaxOffset) {		int &MinOffset, int &MaxOffset) {
bool IsPaired = isPairedLdSt(MI);		bool IsPaired = isPairedLdSt(MI);
bool IsTagStore = isTagStore(MI);		bool IsTagStore = isTagStore(MI);
// ST*G and all paired ldst have the same scale in pre/post-indexed variants		// ST*G and all paired ldst have the same scale in pre/post-indexed variants
// as in the "unsigned offset" variant.		// as in the "unsigned offset" variant.
// All other pre/post indexed ldst instructions are unscaled.		// All other pre/post indexed ldst instructions are unscaled.
Scale = (IsTagStore \|\| IsPaired) ? AArch64InstrInfo::getMemScale(MI) : 1;		Scale = (IsTagStore \|\| IsPaired) ? AArch64InstrInfo::getMemScale(MI) : 1;

if (IsPaired) {		if (IsPaired) {
MinOffset = -64;		MinOffset = -64;
MaxOffset = 63;		MaxOffset = 63;
} else {		} else {
MinOffset = -256;		MinOffset = -256;
MaxOffset = 255;		MaxOffset = 255;
}		}
}		}

static MachineOperand &getLdStRegOp(MachineInstr &MI,		static MachineOperand &getLdStRegOp(MachineInstr &MI,
unsigned PairedRegOp = 0) {		unsigned PairedRegOp = 0) {
assert(PairedRegOp < 2 && "Unexpected register operand idx.");		assert(PairedRegOp < 2 && "Unexpected register operand idx.");
unsigned Idx = isPairedLdSt(MI) ? PairedRegOp : 0;		bool IsPreLdSt = AArch64InstrInfo::isPreLdSt(MI);
		if (IsPreLdSt)
		PairedRegOp += 1;
		dmgreenUnsubmitted Done Reply Inline Actions += 1? dmgreen: += 1?
		unsigned Idx = isPairedLdSt(MI) \|\| IsPreLdSt ? PairedRegOp : 0;
return MI.getOperand(Idx);		return MI.getOperand(Idx);
}		}

static const MachineOperand &getLdStBaseOp(const MachineInstr &MI) {		static const MachineOperand &getLdStBaseOp(const MachineInstr &MI) {
unsigned Idx = isPairedLdSt(MI) ? 2 : 1;		unsigned Idx = isPairedLdSt(MI) \|\| AArch64InstrInfo::isPreLdSt(MI) ? 2 : 1;
return MI.getOperand(Idx);		return MI.getOperand(Idx);
}		}

static const MachineOperand &getLdStOffsetOp(const MachineInstr &MI) {		static const MachineOperand &getLdStOffsetOp(const MachineInstr &MI) {
unsigned Idx = isPairedLdSt(MI) ? 3 : 2;		unsigned Idx = isPairedLdSt(MI) \|\| AArch64InstrInfo::isPreLdSt(MI) ? 3 : 2;
return MI.getOperand(Idx);		return MI.getOperand(Idx);
}		}

static bool isLdOffsetInRangeOfSt(MachineInstr &LoadInst,		static bool isLdOffsetInRangeOfSt(MachineInstr &LoadInst,
MachineInstr &StoreInst,		MachineInstr &StoreInst,
const AArch64InstrInfo *TII) {		const AArch64InstrInfo *TII) {
assert(isMatchingStore(LoadInst, StoreInst) && "Expect only matched ld/st.");		assert(isMatchingStore(LoadInst, StoreInst) && "Expect only matched ld/st.");
int LoadSize = TII->getMemScale(LoadInst);		int LoadSize = TII->getMemScale(LoadInst);
int StoreSize = TII->getMemScale(StoreInst);		int StoreSize = TII->getMemScale(StoreInst);
int UnscaledStOffset = TII->isUnscaledLdSt(StoreInst)		int UnscaledStOffset = TII->hasUnscaledLdStOffset(StoreInst)
? getLdStOffsetOp(StoreInst).getImm()		? getLdStOffsetOp(StoreInst).getImm()
: getLdStOffsetOp(StoreInst).getImm() * StoreSize;		: getLdStOffsetOp(StoreInst).getImm() * StoreSize;
int UnscaledLdOffset = TII->isUnscaledLdSt(LoadInst)		int UnscaledLdOffset = TII->hasUnscaledLdStOffset(LoadInst)
? getLdStOffsetOp(LoadInst).getImm()		? getLdStOffsetOp(LoadInst).getImm()
: getLdStOffsetOp(LoadInst).getImm() * LoadSize;		: getLdStOffsetOp(LoadInst).getImm() * LoadSize;
return (UnscaledStOffset <= UnscaledLdOffset) &&		return (UnscaledStOffset <= UnscaledLdOffset) &&
(UnscaledLdOffset + LoadSize <= (UnscaledStOffset + StoreSize));		(UnscaledLdOffset + LoadSize <= (UnscaledStOffset + StoreSize));
}		}

static bool isPromotableZeroStoreInst(MachineInstr &MI) {		static bool isPromotableZeroStoreInst(MachineInstr &MI) {
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
// If NextI is the second of the two instructions to be merged, we need		// If NextI is the second of the two instructions to be merged, we need
// to skip one further. Either way we merge will invalidate the iterator,		// to skip one further. Either way we merge will invalidate the iterator,
// and we don't need to scan the new instruction, as it's a pairwise		// and we don't need to scan the new instruction, as it's a pairwise
// instruction, which we're not considering for further action anyway.		// instruction, which we're not considering for further action anyway.
if (NextI == MergeMI)		if (NextI == MergeMI)
NextI = next_nodbg(NextI, E);		NextI = next_nodbg(NextI, E);

unsigned Opc = I->getOpcode();		unsigned Opc = I->getOpcode();
bool IsScaled = !TII->isUnscaledLdSt(Opc);		bool IsScaled = !TII->hasUnscaledLdStOffset(Opc);
int OffsetStride = IsScaled ? 1 : TII->getMemScale(*I);		int OffsetStride = IsScaled ? 1 : TII->getMemScale(*I);

bool MergeForward = Flags.getMergeForward();		bool MergeForward = Flags.getMergeForward();
// Insert our new paired instruction after whichever of the paired		// Insert our new paired instruction after whichever of the paired
// instructions MergeForward indicates.		// instructions MergeForward indicates.
MachineBasicBlock::iterator InsertionPoint = MergeForward ? MergeMI : I;		MachineBasicBlock::iterator InsertionPoint = MergeForward ? MergeMI : I;
// Also based on MergeForward is from where we copy the base register operand		// Also based on MergeForward is from where we copy the base register operand
// so we get the flags compatible with the input code.		// so we get the flags compatible with the input code.
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
// and we don't need to scan the new instruction, as it's a pairwise		// and we don't need to scan the new instruction, as it's a pairwise
// instruction, which we're not considering for further action anyway.		// instruction, which we're not considering for further action anyway.
if (NextI == Paired)		if (NextI == Paired)
NextI = next_nodbg(NextI, E);		NextI = next_nodbg(NextI, E);

int SExtIdx = Flags.getSExtIdx();		int SExtIdx = Flags.getSExtIdx();
unsigned Opc =		unsigned Opc =
SExtIdx == -1 ? I->getOpcode() : getMatchingNonSExtOpcode(I->getOpcode());		SExtIdx == -1 ? I->getOpcode() : getMatchingNonSExtOpcode(I->getOpcode());
bool IsUnscaled = TII->isUnscaledLdSt(Opc);		bool IsUnscaled = TII->hasUnscaledLdStOffset(Opc);
int OffsetStride = IsUnscaled ? TII->getMemScale(*I) : 1;		int OffsetStride = IsUnscaled ? TII->getMemScale(*I) : 1;

bool MergeForward = Flags.getMergeForward();		bool MergeForward = Flags.getMergeForward();

Optional<MCPhysReg> RenameReg = Flags.getRenameReg();		Optional<MCPhysReg> RenameReg = Flags.getRenameReg();
if (MergeForward && RenameReg) {		if (MergeForward && RenameReg) {
MCRegister RegToRename = getLdStRegOp(*I).getReg();		MCRegister RegToRename = getLdStRegOp(*I).getReg();
DefinedInBB.addReg(*RenameReg);		DefinedInBB.addReg(*RenameReg);
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	#endif
MachineBasicBlock::iterator InsertionPoint = MergeForward ? Paired : I;		MachineBasicBlock::iterator InsertionPoint = MergeForward ? Paired : I;
// Also based on MergeForward is from where we copy the base register operand		// Also based on MergeForward is from where we copy the base register operand
// so we get the flags compatible with the input code.		// so we get the flags compatible with the input code.
const MachineOperand &BaseRegOp =		const MachineOperand &BaseRegOp =
MergeForward ? getLdStBaseOp(Paired) : getLdStBaseOp(I);		MergeForward ? getLdStBaseOp(Paired) : getLdStBaseOp(I);

int Offset = getLdStOffsetOp(*I).getImm();		int Offset = getLdStOffsetOp(*I).getImm();
int PairedOffset = getLdStOffsetOp(*Paired).getImm();		int PairedOffset = getLdStOffsetOp(*Paired).getImm();
bool PairedIsUnscaled = TII->isUnscaledLdSt(Paired->getOpcode());		bool PairedIsUnscaled = TII->hasUnscaledLdStOffset(Paired->getOpcode());
if (IsUnscaled != PairedIsUnscaled) {		if (IsUnscaled != PairedIsUnscaled) {
// We're trying to pair instructions that differ in how they are scaled. If		// We're trying to pair instructions that differ in how they are scaled. If
// I is scaled then scale the offset of Paired accordingly. Otherwise, do		// I is scaled then scale the offset of Paired accordingly. Otherwise, do
// the opposite (i.e., make Paired's offset unscaled).		// the opposite (i.e., make Paired's offset unscaled).
int MemSize = TII->getMemScale(*Paired);		int MemSize = TII->getMemScale(*Paired);
if (PairedIsUnscaled) {		if (PairedIsUnscaled) {
// If the unscaled offset isn't a multiple of the MemSize, we can't		// If the unscaled offset isn't a multiple of the MemSize, we can't
// pair the operations together.		// pair the operations together.
assert(!(PairedOffset % TII->getMemScale(*Paired)) &&		assert(!(PairedOffset % TII->getMemScale(*Paired)) &&
"Offset should be a multiple of the stride!");		"Offset should be a multiple of the stride!");
PairedOffset /= MemSize;		PairedOffset /= MemSize;
} else {		} else {
PairedOffset *= MemSize;		PairedOffset *= MemSize;
}		}
}		}

// Which register is Rt and which is Rt2 depends on the offset order.		// Which register is Rt and which is Rt2 depends on the offset order.
		// However, for pre load/stores the Rt should be the one of the pre
		// load/store.
MachineInstr RtMI, Rt2MI;		MachineInstr RtMI, Rt2MI;
if (Offset == PairedOffset + OffsetStride) {		if (Offset == PairedOffset + OffsetStride &&
		!AArch64InstrInfo::isPreLdSt(*I)) {
RtMI = &*Paired;		RtMI = &*Paired;
Rt2MI = &*I;		Rt2MI = &*I;
// Here we swapped the assumption made for SExtIdx.		// Here we swapped the assumption made for SExtIdx.
// I.e., we turn ldp I, Paired into ldp Paired, I.		// I.e., we turn ldp I, Paired into ldp Paired, I.
// Update the index accordingly.		// Update the index accordingly.
if (SExtIdx != -1)		if (SExtIdx != -1)
SExtIdx = (SExtIdx + 1) % 2;		SExtIdx = (SExtIdx + 1) % 2;
} else {		} else {
RtMI = &*I;		RtMI = &*I;
Rt2MI = &*Paired;		Rt2MI = &*Paired;
}		}
int OffsetImm = getLdStOffsetOp(*RtMI).getImm();		int OffsetImm = getLdStOffsetOp(*RtMI).getImm();
// Scale the immediate offset, if necessary.		// Scale the immediate offset, if necessary.
if (TII->isUnscaledLdSt(RtMI->getOpcode())) {		if (TII->hasUnscaledLdStOffset(RtMI->getOpcode())) {
assert(!(OffsetImm % TII->getMemScale(*RtMI)) &&		assert(!(OffsetImm % TII->getMemScale(*RtMI)) &&
"Unscaled offset cannot be scaled.");		"Unscaled offset cannot be scaled.");
OffsetImm /= TII->getMemScale(*RtMI);		OffsetImm /= TII->getMemScale(*RtMI);
}		}

// Construct the new instruction.		// Construct the new instruction.
MachineInstrBuilder MIB;		MachineInstrBuilder MIB;
DebugLoc DL = I->getDebugLoc();		DebugLoc DL = I->getDebugLoc();
Show All 14 Lines	if (!MergeForward) {
// STRWui %w1, ...		// STRWui %w1, ...
// USE kill %w1 ; need to clear kill flag when moving STRWui downwards		// USE kill %w1 ; need to clear kill flag when moving STRWui downwards
// STRW %w0		// STRW %w0
Register Reg = getLdStRegOp(*I).getReg();		Register Reg = getLdStRegOp(*I).getReg();
for (MachineInstr &MI : make_range(std::next(I), Paired))		for (MachineInstr &MI : make_range(std::next(I), Paired))
MI.clearRegisterKills(Reg, TRI);		MI.clearRegisterKills(Reg, TRI);
}		}
}		}
MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingPairOpcode(Opc)))
.add(RegOp0)		unsigned int MatchPairOpcode = getMatchingPairOpcode(Opc);
		MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(MatchPairOpcode));

		// Adds the pre-index operand for pre-indexed ld/st pairs.
		if (AArch64InstrInfo::isPreLdSt(*RtMI))
		dmgreenUnsubmitted Done Reply Inline Actions Does this need the second condition in the if? Should that always be true if the first part is a pre? dmgreen: Does this need the second condition in the if? Should that always be true if the first part is…
		stelios-armAuthorUnsubmitted Done Reply Inline Actions If it reaches at this point, the second condition is redundant. Therefore, I will remove it. stelios-arm: If it reaches at this point, the second condition is redundant. Therefore, I will remove it.
		MIB.addReg(BaseRegOp.getReg(), RegState::Define);

		MIB.add(RegOp0)
.add(RegOp1)		.add(RegOp1)
.add(BaseRegOp)		.add(BaseRegOp)
.addImm(OffsetImm)		.addImm(OffsetImm)
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Mis-indentation. I am going to fix this is in an updated revision. stelios-arm: Mis-indentation. I am going to fix this is in an updated revision.
.cloneMergedMemRefs({&I, &Paired})		.cloneMergedMemRefs({&I, &Paired})
.setMIFlags(I->mergeFlagsWith(*Paired));		.setMIFlags(I->mergeFlagsWith(*Paired));

(void)MIB;		(void)MIB;

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Creating pair load/store. Replacing instructions:\n ");		dbgs() << "Creating pair load/store. Replacing instructions:\n ");
LLVM_DEBUG(I->print(dbgs()));		LLVM_DEBUG(I->print(dbgs()));
LLVM_DEBUG(dbgs() << " ");		LLVM_DEBUG(dbgs() << " ");
LLVM_DEBUG(Paired->print(dbgs()));		LLVM_DEBUG(Paired->print(dbgs()));
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	BitExtMI =
.add(StMO)		.add(StMO)
.addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0))		.addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0))
.setMIFlags(LoadI->getFlags());		.setMIFlags(LoadI->getFlags());
} else {		} else {
// FIXME: Currently we disable this transformation in big-endian targets as		// FIXME: Currently we disable this transformation in big-endian targets as
// performance and correctness are verified only in little-endian.		// performance and correctness are verified only in little-endian.
if (!Subtarget->isLittleEndian())		if (!Subtarget->isLittleEndian())
return NextI;		return NextI;
bool IsUnscaled = TII->isUnscaledLdSt(*LoadI);		bool IsUnscaled = TII->hasUnscaledLdStOffset(*LoadI);
assert(IsUnscaled == TII->isUnscaledLdSt(*StoreI) &&		assert(IsUnscaled == TII->hasUnscaledLdStOffset(*StoreI) &&
"Unsupported ld/st match");		"Unsupported ld/st match");
assert(LoadSize <= StoreSize && "Invalid load size");		assert(LoadSize <= StoreSize && "Invalid load size");
int UnscaledLdOffset = IsUnscaled		int UnscaledLdOffset = IsUnscaled
? getLdStOffsetOp(*LoadI).getImm()		? getLdStOffsetOp(*LoadI).getImm()
: getLdStOffsetOp(LoadI).getImm() LoadSize;		: getLdStOffsetOp(LoadI).getImm() LoadSize;
int UnscaledStOffset = IsUnscaled		int UnscaledStOffset = IsUnscaled
? getLdStOffsetOp(*StoreI).getImm()		? getLdStOffsetOp(*StoreI).getImm()
: getLdStOffsetOp(StoreI).getImm() StoreSize;		: getLdStOffsetOp(StoreI).getImm() StoreSize;
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
if (MI.hasOrderedMemoryRef() \|\| TII->isLdStPairSuppressed(MI))		if (MI.hasOrderedMemoryRef() \|\| TII->isLdStPairSuppressed(MI))
return false;		return false;

// We should have already checked FirstMI for pair suppression and volatility.		// We should have already checked FirstMI for pair suppression and volatility.
assert(!FirstMI.hasOrderedMemoryRef() &&		assert(!FirstMI.hasOrderedMemoryRef() &&
!TII->isLdStPairSuppressed(FirstMI) &&		!TII->isLdStPairSuppressed(FirstMI) &&
"FirstMI shouldn't get here if either of these checks are true.");		"FirstMI shouldn't get here if either of these checks are true.");

unsigned OpcA = FirstMI.getOpcode();		unsigned OpcA = FirstMI.getOpcode();
		stelios-armAuthorUnsubmitted Done Reply Inline Actions Ditto. stelios-arm: Ditto.
unsigned OpcB = MI.getOpcode();		unsigned OpcB = MI.getOpcode();

// Opcodes match: nothing more to check.		// Opcodes match: If the opcodes are pre ld/st there is nothing more to check.
if (OpcA == OpcB)		if (OpcA == OpcB)
return true;		return !AArch64InstrInfo::isPreLdSt(FirstMI);
		dmgreenUnsubmitted Done Reply Inline Actions Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to check." dmgreen: Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to…

// Try to match a sign-extended load/store with a zero-extended load/store.		// Try to match a sign-extended load/store with a zero-extended load/store.
bool IsValidLdStrOpc, PairIsValidLdStrOpc;		bool IsValidLdStrOpc, PairIsValidLdStrOpc;
unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);		unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);
assert(IsValidLdStrOpc &&		assert(IsValidLdStrOpc &&
"Given Opc should be a Load or Store with an immediate");		"Given Opc should be a Load or Store with an immediate");
// OpcA will be the first instruction in the pair.		// OpcA will be the first instruction in the pair.
if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {		if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {
Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);		Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);
return true;		return true;
}		}

// If the second instruction isn't even a mergable/pairable load/store, bail		// If the second instruction isn't even a mergable/pairable load/store, bail
// out.		// out.
if (!PairIsValidLdStrOpc)		if (!PairIsValidLdStrOpc)
return false;		return false;

// FIXME: We don't support merging narrow stores with mixed scaled/unscaled		// FIXME: We don't support merging narrow stores with mixed scaled/unscaled
// offsets.		// offsets.
if (isNarrowStore(OpcA) \|\| isNarrowStore(OpcB))		if (isNarrowStore(OpcA) \|\| isNarrowStore(OpcB))
return false;		return false;

		// The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and
		// LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui
		// are candidate pairs that can be merged.
		if (isPreLdStPairCandidate(FirstMI, MI))
		return true;

// Try to match an unscaled load/store with a scaled load/store.		// Try to match an unscaled load/store with a scaled load/store.
return TII->isUnscaledLdSt(OpcA) != TII->isUnscaledLdSt(OpcB) &&		return TII->hasUnscaledLdStOffset(OpcA) != TII->hasUnscaledLdStOffset(OpcB) &&
getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB);		getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB);

// FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?		// FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?
}		}

static bool		static bool
canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,		canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,		SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
bool FindNarrowMerge) {		bool FindNarrowMerge) {
MachineBasicBlock::iterator E = I->getParent()->end();		MachineBasicBlock::iterator E = I->getParent()->end();
MachineBasicBlock::iterator MBBI = I;		MachineBasicBlock::iterator MBBI = I;
MachineBasicBlock::iterator MBBIWithRenameReg;		MachineBasicBlock::iterator MBBIWithRenameReg;
MachineInstr &FirstMI = *I;		MachineInstr &FirstMI = *I;
MBBI = next_nodbg(MBBI, E);		MBBI = next_nodbg(MBBI, E);

bool MayLoad = FirstMI.mayLoad();		bool MayLoad = FirstMI.mayLoad();
bool IsUnscaled = TII->isUnscaledLdSt(FirstMI);		bool IsUnscaled = TII->hasUnscaledLdStOffset(FirstMI);
Register Reg = getLdStRegOp(FirstMI).getReg();		Register Reg = getLdStRegOp(FirstMI).getReg();
Register BaseReg = getLdStBaseOp(FirstMI).getReg();		Register BaseReg = getLdStBaseOp(FirstMI).getReg();
int Offset = getLdStOffsetOp(FirstMI).getImm();		int Offset = getLdStOffsetOp(FirstMI).getImm();
int OffsetStride = IsUnscaled ? TII->getMemScale(FirstMI) : 1;		int OffsetStride = IsUnscaled ? TII->getMemScale(FirstMI) : 1;
bool IsPromotableZeroStore = isPromotableZeroStoreInst(FirstMI);		bool IsPromotableZeroStore = isPromotableZeroStoreInst(FirstMI);

Optional<bool> MaybeCanRename = None;		Optional<bool> MaybeCanRename = None;
if (!EnableRenaming)		if (!EnableRenaming)
Show All 31 Lines	if (areCandidatesToMergeOrPair(FirstMI, MI, Flags, TII) &&
// If we've found another instruction with the same opcode, check to see		// If we've found another instruction with the same opcode, check to see
// if the base and offset are compatible with our starting instruction.		// if the base and offset are compatible with our starting instruction.
// These instructions all have scaled immediate operands, so we just		// These instructions all have scaled immediate operands, so we just
// check for +1/-1. Make sure to check the new instruction offset is		// check for +1/-1. Make sure to check the new instruction offset is
// actually an immediate and not a symbolic reference destined for		// actually an immediate and not a symbolic reference destined for
// a relocation.		// a relocation.
Register MIBaseReg = getLdStBaseOp(MI).getReg();		Register MIBaseReg = getLdStBaseOp(MI).getReg();
int MIOffset = getLdStOffsetOp(MI).getImm();		int MIOffset = getLdStOffsetOp(MI).getImm();
bool MIIsUnscaled = TII->isUnscaledLdSt(MI);		bool MIIsUnscaled = TII->hasUnscaledLdStOffset(MI);
if (IsUnscaled != MIIsUnscaled) {		if (IsUnscaled != MIIsUnscaled) {
// We're trying to pair instructions that differ in how they are scaled.		// We're trying to pair instructions that differ in how they are scaled.
// If FirstMI is scaled then scale the offset of MI accordingly.		// If FirstMI is scaled then scale the offset of MI accordingly.
// Otherwise, do the opposite (i.e., make MI's offset unscaled).		// Otherwise, do the opposite (i.e., make MI's offset unscaled).
int MemSize = TII->getMemScale(MI);		int MemSize = TII->getMemScale(MI);
if (MIIsUnscaled) {		if (MIIsUnscaled) {
// If the unscaled offset isn't a multiple of the MemSize, we can't		// If the unscaled offset isn't a multiple of the MemSize, we can't
// pair the operations together: bail and keep looking.		// pair the operations together: bail and keep looking.
if (MIOffset % MemSize) {		if (MIOffset % MemSize) {
LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits,		LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits,
UsedRegUnits, TRI);		UsedRegUnits, TRI);
MemInsns.push_back(&MI);		MemInsns.push_back(&MI);
continue;		continue;
}		}
MIOffset /= MemSize;		MIOffset /= MemSize;
} else {		} else {
MIOffset *= MemSize;		MIOffset *= MemSize;
}		}
}		}

if (BaseReg == MIBaseReg && ((Offset == MIOffset + OffsetStride) \|\|		bool IsPreLdSt = isPreLdStPairCandidate(FirstMI, MI);
		dmgreenUnsubmitted Not Done Reply Inline Actions Why does this not already handle the combining of PreLdStPair? The existing code can combine in both directions. Presumably it's only valid for forward now? dmgreen: Why does this not already handle the combining of PreLdStPair? The existing code can combine…
		stelios-armAuthorUnsubmitted Done Reply Inline Actions For example, the following instructions: str w1 [x0, #20]! str w2 [x0, #4] Can be paired to: Stp w1, w2, [x0, #20]! The offset of the first and second instruction is `20` and `4`, respectively. The offset stride is `4`. Therefore, the check `(Offset == MIOffset + OffsetStride)` and `(Offset + OffsetStride == MIOffset)` will return `false`. That’s is why it’s needed. And yes, for such cases it’s only valid for forward now, since the order of the instructions matters for this optimization. stelios-arm: For example, the following instructions: ``` str w1 [x0, #20]! str w2 [x0, #4] ``` Can be…
		dmgreenUnsubmitted Done Reply Inline Actions OK, but it looks like the existing `(Offset == MIOffset + OffsetStride)` conditions could be true for preinc where we don't want them to be. Can we switch it around to be something like: bools IsPreLdSt = isPreLdStPairCandidate(..) if (!IsPreLdSt) { check conditions else continue } else { check pre conditions else continue } That way we don't need the extra indenting, and the conditions don't get muddled together. dmgreen: OK, but it looks like the existing `(Offset == MIOffset + OffsetStride)` conditions could be…
(Offset + OffsetStride == MIOffset))) {
		if (BaseReg == MIBaseReg) {
		dmgreenUnsubmitted Done Reply Inline Actions -> Additionally -> operations dmgreen: -> Additionally -> operations
		// If the offset of the second ld/st is not equal to the size of the
		// destination register it can’t be paired with a pre-index ld/st
		// pair. Additionally if the base reg is used or modified the operations
		// can't be paired: bail and keep looking.
		if (IsPreLdSt) {
		bool IsOutOfBounds = MIOffset != TII->getMemScale(MI);
		bool IsBaseRegUsed =
		!UsedRegUnits.available(getLdStBaseOp(MI).getReg());
		bool IsBaseRegModified =
		!ModifiedRegUnits.available(getLdStBaseOp(MI).getReg());
		if (IsOutOfBounds \|\| IsBaseRegUsed \|\| IsBaseRegModified) {
		LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits,
		UsedRegUnits, TRI);
		MemInsns.push_back(&MI);
		continue;
		}
		} else {
		if ((Offset != MIOffset + OffsetStride) &&
		(Offset + OffsetStride != MIOffset)) {
		LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits,
		UsedRegUnits, TRI);
		MemInsns.push_back(&MI);
		continue;
		}
		}

int MinOffset = Offset < MIOffset ? Offset : MIOffset;		int MinOffset = Offset < MIOffset ? Offset : MIOffset;
if (FindNarrowMerge) {		if (FindNarrowMerge) {
// If the alignment requirements of the scaled wide load/store		// If the alignment requirements of the scaled wide load/store
// instruction can't express the offset of the scaled narrow input,		// instruction can't express the offset of the scaled narrow input,
// bail and keep looking. For promotable zero stores, allow only when		// bail and keep looking. For promotable zero stores, allow only when
// the stored value is the same (i.e., WZR).		// the stored value is the same (i.e., WZR).
if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) \|\|		if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) \|\|
(IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) {		(IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) {
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	bool AArch64LoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) {
MachineBasicBlock::iterator E = MI.getParent()->end();		MachineBasicBlock::iterator E = MI.getParent()->end();

if (!TII->isCandidateToMergeOrPair(MI))		if (!TII->isCandidateToMergeOrPair(MI))
return false;		return false;

// Early exit if the offset is not possible to match. (6 bits of positive		// Early exit if the offset is not possible to match. (6 bits of positive
// range, plus allow an extra one in case we find a later insn that matches		// range, plus allow an extra one in case we find a later insn that matches
// with Offset-1)		// with Offset-1)
bool IsUnscaled = TII->isUnscaledLdSt(MI);		bool IsUnscaled = TII->hasUnscaledLdStOffset(MI);
int Offset = getLdStOffsetOp(MI).getImm();		int Offset = getLdStOffsetOp(MI).getImm();
int OffsetStride = IsUnscaled ? TII->getMemScale(MI) : 1;		int OffsetStride = IsUnscaled ? TII->getMemScale(MI) : 1;
// Allow one more for offset.		// Allow one more for offset.
if (Offset > 0)		if (Offset > 0)
Offset -= OffsetStride;		Offset -= OffsetStride;
if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride))		if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride))
return false;		return false;

// Look ahead up to LdStLimit instructions for a pairable instruction.		// Look ahead up to LdStLimit instructions for a pairable instruction.
LdStPairFlags Flags;		LdStPairFlags Flags;
MachineBasicBlock::iterator Paired =		MachineBasicBlock::iterator Paired =
findMatchingInsn(MBBI, Flags, LdStLimit, /* FindNarrowMerge = */ false);		findMatchingInsn(MBBI, Flags, LdStLimit, /* FindNarrowMerge = */ false);
if (Paired != E) {		if (Paired != E) {
++NumPairCreated;		++NumPairCreated;
if (TII->isUnscaledLdSt(MI))		if (TII->hasUnscaledLdStOffset(MI))
++NumUnscaledPairCreated;		++NumUnscaledPairCreated;
// Keeping the iterator straight is a pain, so we let the merge routine tell		// Keeping the iterator straight is a pain, so we let the merge routine tell
// us what the next instruction is after it's done mucking about.		// us what the next instruction is after it's done mucking about.
auto Prev = std::prev(MBBI);		auto Prev = std::prev(MBBI);
MBBI = mergePairedInsns(MBBI, Paired, Flags);		MBBI = mergePairedInsns(MBBI, Paired, Flags);
// Collect liveness info for instructions between Prev and the new position		// Collect liveness info for instructions between Prev and the new position
// MBBI.		// MBBI.
for (auto I = std::next(Prev); I != MBBI; I++)		for (auto I = std::next(Prev); I != MBBI; I++)
Show All 18 Lines	bool AArch64LoadStoreOpt::tryToMergeLdStUpdate
Update = findMatchingUpdateInsnForward(MBBI, 0, UpdateLimit);		Update = findMatchingUpdateInsnForward(MBBI, 0, UpdateLimit);
if (Update != E) {		if (Update != E) {
// Merge the update into the ld/st.		// Merge the update into the ld/st.
MBBI = mergeUpdateInsn(MBBI, Update, /IsPreIdx=/false);		MBBI = mergeUpdateInsn(MBBI, Update, /IsPreIdx=/false);
return true;		return true;
}		}

// Don't know how to handle unscaled pre/post-index versions below, so bail.		// Don't know how to handle unscaled pre/post-index versions below, so bail.
if (TII->isUnscaledLdSt(MI.getOpcode()))		if (TII->hasUnscaledLdStOffset(MI.getOpcode()))
return false;		return false;

// Look back to try to find a pre-index instruction. For example,		// Look back to try to find a pre-index instruction. For example,
// add x0, x0, #8		// add x0, x0, #8
// ldr x1, [x0]		// ldr x1, [x0]
// merged into:		// merged into:
// ldr x1, [x0, #8]!		// ldr x1, [x0, #8]!
Update = findMatchingUpdateInsnBackward(MBBI, UpdateLimit);		Update = findMatchingUpdateInsnBackward(MBBI, UpdateLimit);
▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-memset-inline.ll

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bl something
%buf = alloca [4 x i8], align 1		%buf = alloca [4 x i8], align 1
%cast = bitcast [4 x i8]* %buf to i8*		%cast = bitcast [4 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 4, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 4, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

define void @bzero_8_stack() {		define void @bzero_8_stack() {
; CHECK-LABEL: bzero_8_stack:		; CHECK-LABEL: bzero_8_stack:
stelios-armAuthorUnsubmitted Done Reply Inline Actions In case you are wondering why with the new patch this is changed to and STP here is the full check commands pre-patch: ; CHECK-LABEL: bzero_8_stack: ; CHECK: // %bb.0: ; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: .cfi_offset w30, -16 ; CHECK-NEXT: add x0, sp, #8 // =8 ; CHECK-NEXT: str xzr, [sp, #8] ; CHECK-NEXT: bl something ; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload ; CHECK-NEXT: ret stelios-arm: In case you are wondering why with the new patch this is changed to and STP here is the full…
; CHECK: str xzr, [sp, #8]		; CHECK: stp x30, xzr, [sp, #-16]!
; CHECK-NEXT: bl something		; CHECK: bl something
%buf = alloca [8 x i8], align 1		%buf = alloca [8 x i8], align 1
%cast = bitcast [8 x i8]* %buf to i8*		%cast = bitcast [8 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 8, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 0, i32 8, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

define void @bzero_12_stack() {		define void @bzero_12_stack() {
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bl something
%buf = alloca [4 x i8], align 1		%buf = alloca [4 x i8], align 1
%cast = bitcast [4 x i8]* %buf to i8*		%cast = bitcast [4 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 4, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 4, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

define void @memset_8_stack() {		define void @memset_8_stack() {
; CHECK-LABEL: memset_8_stack:		; CHECK-LABEL: memset_8_stack:
stelios-armAuthorUnsubmitted Done Reply Inline Actions Similarly: ; CHECK-LABEL: memset_8_stack: ; CHECK: // %bb.0: ; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: .cfi_offset w30, -16 ; CHECK-NEXT: mov x8, #-6148914691236517206 ; CHECK-NEXT: add x0, sp, #8 // =8 ; CHECK-NEXT: str x8, [sp, #8] ; CHECK-NEXT: bl something ; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload ; CHECK-NEXT: ret stelios-arm: Similarly: ``` ; CHECK-LABEL: memset_8_stack: ; CHECK: // %bb.0: ; CHECK-NEXT: str…
; CHECK: mov x8, #-6148914691236517206		; CHECK: mov x8, #-6148914691236517206
; CHECK-NEXT: add x0, sp, #8		; CHECK-NEXT: stp x30, x8, [sp, #-16]!
; CHECK-NEXT: str x8, [sp, #8]		; CHECK-NEXT: add x0, sp, #8 // =8
; CHECK-NEXT: bl something		; CHECK-NEXT: bl something
%buf = alloca [8 x i8], align 1		%buf = alloca [8 x i8], align 1
%cast = bitcast [8 x i8]* %buf to i8*		%cast = bitcast [8 x i8]* %buf to i8*
call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 8, i1 false)		call void @llvm.memset.p0i8.i32(i8* %cast, i8 -86, i32 8, i1 false)
call void @something(i8* %cast)		call void @something(i8* %cast)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir

This file was added.

				# RUN: llc -o - %s -mtriple=aarch64-none-eabi -mcpu=cortex-a55 -lsr-preferred-addressing-mode=preindexed -stop-after=aarch64-ldst-opt \| FileCheck %s

				---
				name: 1-ldrwpre-ldrwui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$w0' }
				- { reg: '$w1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $w0, $w1, $x1
				; CHECK-LABEL: name: 1-ldrwpre-ldrwui-merge
				; CHECK: liveins: $w0, $w1, $x1
				; CHECK: early-clobber $x1, renamable $w0, renamable $w1 = LDPWpre renamable $x1, 5 :: (load 4)
				; CHECK: STPWi renamable $w0, renamable $w1, renamable $x1, 0 :: (store 4)
				dmgreenUnsubmitted Done Reply Inline Actions Although I don't think it's an issue with this patch exactly, should this not have two MMO's? Either be :: (load 8) or :: (load 4) (load 4) ? dmgreen: Although I don't think it's an issue with this patch exactly, should this not have two MMO's?
				stelios-armAuthorUnsubmitted Done Reply Inline Actions Yes, it should have "(load 4), (load 4)" instead. stelios-arm: Yes, it should have "(load 4), (load 4)" instead.
				dmgreenUnsubmitted Not Done Reply Inline Actions OK. It's because they are identical, so are elided during the merging of memory operands. It's a little strange to not see both, but fine. You don't get any extra information out of having both in this case. dmgreen: OK. It's because they are identical, so are elided during the merging of memory operands. It's…
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $w0 = LDRWpre killed renamable $x1, 20 :: (load 4)
				renamable $w1 = LDRWui renamable $x1, 1 :: (load 4)
				STRWui killed renamable $w0, renamable $x1, 0 :: (store 4)
				STRWui killed renamable $w1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 2-ldrxpre-ldrxui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$x2' }
				- { reg: '$x3' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $x2, $x3, $x1
				; CHECK-LABEL: name: 2-ldrxpre-ldrxui-merge
				; CHECK: liveins: $x1, $x2, $x3
				; CHECK: early-clobber $x1, renamable $x2, renamable $x3 = LDPXpre renamable $x1, 3 :: (load 8)
				; CHECK: STPXi renamable $x2, renamable $x3, renamable $x1, 0 :: (store 8)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $x2 = LDRXpre killed renamable $x1, 24 :: (load 8)
				renamable $x3 = LDRXui renamable $x1, 1 :: (load 8)
				STRXui killed renamable $x2, renamable $x1, 0 :: (store 8)
				STRXui killed renamable $x3, renamable $x1, 1 :: (store 8)
				RET undef $lr
				...


				---
				name: 3-ldrspre-ldrsui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x1
				; CHECK-LABEL: name: 3-ldrspre-ldrsui-merge
				; CHECK: liveins: $s0, $s1, $x1
				; CHECK: early-clobber $x1, renamable $s0, renamable $s1 = LDPSpre renamable $x1, 3 :: (load 4)
				; CHECK: STRSui renamable $s0, renamable $x1, 0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 12 :: (load 4)
				renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 4-ldrqdre-ldrdui-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$d0' }
				- { reg: '$d1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $d0, $d1, $x1
				; CHECK-LABEL: name: 4-ldrqdre-ldrdui-merge
				; CHECK: liveins: $d0, $d1, $x1
				; CHECK: early-clobber $x1, renamable $d0, renamable $d1 = LDPDpre renamable $x1, 16 :: (load 8)
				; CHECK: STRDui renamable $d0, renamable $x1, 0 :: (store 8)
				; CHECK: STRDui renamable $d1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 8)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $d0 = LDRDpre killed renamable $x1, 128 :: (load 8)
				renamable $d1 = LDRDui renamable $x1, 1 :: (load 8)
				STRDui killed renamable $d0, renamable $x1, 0 :: (store 8)
				STRDui killed renamable $d1, renamable $x1, 1 :: (store 8)
				RET undef $lr
				...


				---
				name: 5-ldrqpre-ldrqui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 5-ldrqpre-ldrqui-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber $x1, renamable $q0, renamable $q1 = LDPQpre renamable $x1, 3 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 48 :: (load 16)
				renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 6-ldrqui-ldrqpre-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 6-ldrqui-ldrqpre-no-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
				; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 48, implicit $w1 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 48 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 7-ldrqpre-ldrqui-max-offset-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 7-ldrqpre-ldrqui-max-offset-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber $x1, renamable $q0, renamable $q1 = LDPQpre renamable $x1, 15 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 240 :: (load 16)
				renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 8-ldrqpre-ldrqui-min-offset-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 8-ldrqpre-ldrqui-min-offset-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber $x1, renamable $q0, renamable $q1 = LDPQpre renamable $x1, -16 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, -256 :: (load 16)
				renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 9-ldrspre-ldrsui-mod-base-reg-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x0, $x1
				; CHECK-LABEL: name: 9-ldrspre-ldrsui-mod-base-reg-no-merge
				; CHECK: liveins: $s0, $s1, $x0, $x1
				; CHECK: dead early-clobber renamable $x1, renamable $s0 = LDRSpre renamable $x1, 12, implicit $w1 :: (load 4)
				; CHECK: renamable $x1 = LDRXui renamable $x0, 1 :: (load 8)
				; CHECK: renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				; CHECK: STPSi renamable $s0, renamable $s1, renamable $x1, 0 :: (store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 12 :: (load 4)
				renamable $x1 = LDRXui renamable $x0, 1 :: (load 8)
				renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 10-ldrspre-ldrsui-used-base-reg-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x0, $x1
				; CHECK-LABEL: name: 10-ldrspre-ldrsui-used-base-reg-no-merge
				; CHECK: liveins: $s0, $s1, $x0, $x1
				; CHECK: early-clobber renamable $x1, renamable $s0 = LDRSpre renamable $x1, 12, implicit $w1 :: (load 4)
				; CHECK: renamable $x0 = LDRXui renamable $x1, 1 :: (load 8)
				; CHECK: STRXui renamable $x0, renamable $x0, 1 :: (store 8)
				; CHECK: renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				; CHECK: STRSui renamable $s0, renamable $x1, 0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 12 :: (load 4)
				renamable $x0 = LDRXui renamable $x1, 1 :: (load 8)
				STRXui killed renamable $x0, renamable $x0, 1 :: (store 8)
				renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 11-ldrqpre-ldrqpre-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 11-ldrqpre-ldrqpre-no-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber renamable $x1, dead renamable $q0 = LDRQpre renamable $x1, 48, implicit $w1 :: (load 16)
				; CHECK: early-clobber renamable $x1, dead renamable $q1 = LDRQpre renamable $x1, 1, implicit $w1 :: (load 16)
				; CHECK: early-clobber renamable $x1, dead renamable $q0 = LDRQpre renamable $x1, 16, implicit $w1 :: (load 16)
				; CHECK: early-clobber renamable $x1, dead renamable $q1 = LDRQpre renamable $x1, 12, implicit $w1 :: (load 16)
				; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 16, implicit $w1 :: (load 16)
				; CHECK: early-clobber renamable $x1, renamable $q1 = LDRQpre renamable $x1, 16, implicit $w1 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 48 :: (load 16)
				early-clobber renamable $x1, renamable $q1 = LDRQpre killed renamable $x1, 1 :: (load 16)
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 16 :: (load 16)
				early-clobber renamable $x1, renamable $q1 = LDRQpre killed renamable $x1, 12 :: (load 16)
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 16 :: (load 16)
				early-clobber renamable $x1, renamable $q1 = LDRQpre killed renamable $x1, 16 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 12-ldrspre-ldrsui-no-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x1

				; The offset of the second load is not equal to the
				; size of the destination register, and hence can’t be merged.

				; CHECK-LABEL: name: 12-ldrspre-ldrsui-no-merge
				; CHECK: liveins: $s0, $s1, $x1
				; CHECK: early-clobber renamable $x1, renamable $s0 = LDRSpre renamable $x1, 12, implicit $w1 :: (load 4)
				; CHECK: renamable $s1 = LDRSui renamable $x1, 2 :: (load 4)
				; CHECK: STRSui renamable $s0, renamable $x1, 0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 12 :: (load 4)
				renamable $s1 = LDRSui renamable $x1, 2 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 13-ldrqpre-ldrdui-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$d1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $d1, $x1
				; CHECK-LABEL: name: 13-ldrqpre-ldrdui-no-merge
				; CHECK: liveins: $d1, $q0, $x1
				; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 32, implicit $w1 :: (load 16)
				; CHECK: renamable $d1 = LDRDui renamable $x1, 1 :: (load 8)
				; CHECK: STRQui renamable $q0, renamable $x1, 0 :: (store 16)
				; CHECK: STRDui renamable $d1, renamable $x1, 1 :: (store 8)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 32 :: (load 16)
				renamable $d1 = LDRDui renamable $x1, 1 :: (load 8)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRDui killed renamable $d1, renamable $x1, 1 :: (store 8)
				RET undef $lr
				...


				---
				name: 14-ldrqpre-strqui-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				dmgreenUnsubmitted Done Reply Inline Actions Comments can be added here with `;` dmgreen: Comments can be added here with `;`
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 14-ldrqpre-strqui-no-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 32, implicit $w1 :: (load 16)
				; CHECK: STRQui renamable $q0, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 32 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				RET undef $lr
				...


				---
				name: 15-ldrqpre-ldrqui-same-dst-reg-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $x1
				; CHECK-LABEL: name: 15-ldrqpre-ldrqui-same-dst-reg-no-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber renamable $x1, dead renamable $q0 = LDRQpre renamable $x1, 32, implicit $w1 :: (load 16)
				; CHECK: renamable $q0 = LDRQui renamable $x1, 1 :: (load 16)
				; CHECK: STRQui renamable $q0, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 32 :: (load 16)
				renamable $q0 = LDRQui renamable $x1, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				RET undef $lr
				...


				---
				dmgreenUnsubmitted Not Done Reply Inline Actions store 8 dmgreen: store 8
				name: 16-ldrqpre-ldrqui-diff-base-reg-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$x2' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1, $x2
				; CHECK-LABEL: name: 16-ldrqpre-ldrqui-diff-base-reg-no-merge
				; CHECK: liveins: $q0, $q1, $x1, $x2
				; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 32, implicit $w1 :: (load 16)
				; CHECK: renamable $q1 = LDRQui renamable $x2, 1 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 32 :: (load 16)
				renamable $q1 = LDRQui renamable $x2, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 17-ldrqpre-ldurqi-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 17-ldrqpre-ldurqi-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber $x1, renamable $q0, renamable $q1 = LDPQpre renamable $x1, 2 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 32 :: (load 16)
				renamable $q1 = LDURQi renamable $x1, 16 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 18-ldrqpre-ldurqi-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $q0, $q1, $x1
				; CHECK-LABEL: name: 18-ldrqpre-ldurqi-no-merge
				; CHECK: liveins: $q0, $q1, $x1
				; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 32, implicit $w1 :: (load 16)
				; CHECK: renamable $q1 = LDURQi renamable $x1, 1 :: (load 16)
				; CHECK: STPQi renamable $q0, renamable $q1, renamable $x1, 0 :: (store 16)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $q0 = LDRQpre killed renamable $x1, 32 :: (load 16)
				renamable $q1 = LDURQi renamable $x1, 1 :: (load 16)
				STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
				STRQui killed renamable $q1, renamable $x1, 1 :: (store 16)
				RET undef $lr
				...


				---
				name: 19-ldrspre-ldrsui-max-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x1
				; CHECK-LABEL: name: 19-ldrspre-ldrsui-max-merge
				; CHECK: liveins: $s0, $s1, $x1
				; CHECK: early-clobber $x1, renamable $s0, renamable $s1 = LDPSpre renamable $x1, 63 :: (load 4)
				; CHECK: STRSui renamable $s0, renamable $x1, 0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 252 :: (load 4)
				renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...


				---
				name: 20-ldrspre-ldrsui-unaligned-no-merge
				tracksRegLiveness: true
				liveins:
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0:
				liveins: $s0, $s1, $x1
				; CHECK-LABEL: name: 20-ldrspre-ldrsui-unaligned-no-merge
				; CHECK: liveins: $s0, $s1, $x1
				; CHECK: early-clobber renamable $x1, renamable $s0 = LDRSpre renamable $x1, 251, implicit $w1 :: (load 4)
				; CHECK: renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				; CHECK: STRSui renamable $s0, renamable $x1, 0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x1, 1 :: ("aarch64-suppress-pair" store 4)
				; CHECK: RET undef $lr
				early-clobber renamable $x1, renamable $s0 = LDRSpre killed renamable $x1, 251 :: (load 4)
				renamable $s1 = LDRSui renamable $x1, 1 :: (load 4)
				STRSui killed renamable $s0, renamable $x1, 0 :: (store 4)
				STRSui killed renamable $s1, renamable $x1, 1 :: (store 4)
				RET undef $lr
				...
				dmgreenUnsubmitted Done Reply Inline Actions According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP. So 1024 is the first multiple of 16 that should be invalid. dmgreen: According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP. So 1024 is the…
				stelios-armAuthorUnsubmitted Done Reply Inline Actions According to this, the offset of LDR<>pre/STR<>pre is in range `[-256,255]`. Assuming that `x∈[-256,255]` and `c = size of <S,D,Q,W,X>` , for this type of optimization the resulted LDP<>pre/STP<>pre will be in range of `[min(x mod c == 0), max(x mod c == 0)]`. stelios-arm: According to [[ https://godbolt.org/z/a65v3vzcT \| this ]], the offset of LDR<>pre/STR<>pre is…

llvm/test/CodeGen/AArch64/strpre-str-merge.mir

This file was added.

				# RUN: llc -o - %s -mtriple=aarch64-none-eabi -mcpu=cortex-a55 -lsr-preferred-addressing-mode=preindexed -stop-after=aarch64-ldst-opt \| FileCheck %s

				---
				name: 1-strwpre-strwui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$w1' }
				- { reg: '$w2' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $w1, $w2, $x0
				; CHECK-LABEL: name: 1-strwpre-strwui-merge
				; CHECK: liveins: $w1, $w2, $x0
				; CHECK: early-clobber $x0 = STPWpre renamable $w1, renamable $w2, renamable $x0, 5 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRWpre killed renamable $w1, killed renamable $x0, 20 :: (store 4)
				STRWui killed renamable $w2, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0

				...


				---
				name: 2-strxpre-strxui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				- { reg: '$x2' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $x0, $x1, $x2

				; CHECK-LABEL: name: 2-strxpre-strxui-merge
				; CHECK: liveins: $x0, $x1, $x2
				; CHECK: early-clobber $x0 = STPXpre renamable $x1, renamable $x2, renamable $x0, 3 :: (store 8)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRXpre killed renamable $x1, killed renamable $x0, 24 :: (store 8)
				STRXui killed renamable $x2, renamable $x0, 1 :: (store 8)
				RET undef $lr, implicit $x0

				...


				---
				name: 3-strspre-strsui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 3-strspre-strsui-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber $x0 = STPSpre renamable $s0, renamable $s1, renamable $x0, 3 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 12 :: (store 4)
				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 4-strdpre-strdui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$d0' }
				- { reg: '$d1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $d0, $d1, $x0

				; CHECK-LABEL: name: 4-strdpre-strdui-merge
				; CHECK: liveins: $d0, $d1, $x0
				; CHECK: early-clobber $x0 = STPDpre renamable $d0, renamable $d1, renamable $x0, 16 :: (store 8)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRDpre killed renamable $d0, killed renamable $x0, 128 :: (store 8)
				STRDui killed renamable $d1, renamable $x0, 1 :: (store 8)
				RET undef $lr, implicit $x0

				...


				---
				name: 5-strqpre-strqui-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $x0

				; CHECK-LABEL: name: 5-strqpre-strqui-merge
				; CHECK: liveins: $q0, $q1, $x0
				; CHECK: early-clobber $x0 = STPQpre renamable $q0, renamable $q1, renamable $x0, 3 :: (store 16)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRQpre killed renamable $q0, killed renamable $x0, 48 :: (store 16)
				STRQui killed renamable $q1, renamable $x0, 1 :: (store 16)
				RET undef $lr, implicit $x0

				...


				---
				name: 6-strqui-strqpre-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $x0
				; CHECK-LABEL: name: 6-strqui-strqpre-no-merge
				; CHECK: liveins: $q0, $q1, $x0
				; CHECK: STRQui renamable $q1, renamable $x0, 1 :: (store 16)
				; CHECK: early-clobber renamable $x0 = STRQpre renamable $q0, renamable $x0, 48, implicit $w0 :: (store 16)
				; CHECK: RET undef $lr, implicit $x0
				STRQui killed renamable $q1, renamable $x0, 1 :: (store 16)
				early-clobber renamable $x0 = STRQpre killed renamable $q0, killed renamable $x0, 48 :: (store 16)
				RET undef $lr, implicit $x0
				...


				---
				name: 7-strspre-strsui-max-offset-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 7-strspre-strsui-max-offset-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber $x0 = STPSpre renamable $s0, renamable $s1, renamable $x0, 63 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 252 :: (store 4)
				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 8-strspre-strsui-min-offset-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 8-strspre-strsui-min-offset-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber $x0 = STPSpre renamable $s0, renamable $s1, renamable $x0, -64 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, -256 :: (store 4)
				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 9-strspre-strsui-mod-base-reg-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0, $x1
				; CHECK-LABEL: name: 9-strspre-strsui-mod-base-reg-no-merge
				; CHECK: liveins: $s0, $s1, $x0, $x1
				; CHECK: dead early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 12, implicit $w0 :: (store 4)
				; CHECK: renamable $x0 = LDRXui renamable $x1, 1 :: (load 8)
				; CHECK: STRSui renamable $s1, renamable $x0, 1 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 12 :: (store 4)
				renamable $x0 = LDRXui renamable $x1, 1 :: (load 8)
				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 10-strspre-strsui-used-base-reg-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0, $x1
				; CHECK-LABEL: name: 10-strspre-strsui-used-base-reg-no-merge
				; CHECK: liveins: $s0, $s1, $x0, $x1
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 12, implicit $w0 :: (store 4)
				dmgreenUnsubmitted Done Reply Inline Actions -257? If the strp has to be aligned, is it worth adding a test for +/-260 too? dmgreen: -257? If the strp has to be aligned, is it worth adding a test for +/-260 too?
				; CHECK: STRXui renamable $x1, renamable $x1, 1 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x0, 1 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 12 :: (store 4)

				STRXui killed renamable $x1, renamable $x1, 1 :: (store 4)

				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 11-strspre-strspre-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 11-strspre-strspre-no-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 12, implicit $w0 :: (store 4)
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s1, renamable $x0, 16, implicit $w0 :: (store 4)
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 4, implicit $w0 :: (store 4)
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s1, renamable $x0, 12, implicit $w0 :: (store 4)
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 4, implicit $w0 :: (store 4)
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s1, renamable $x0, 4, implicit $w0 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre renamable $s0, killed renamable $x0, 12 :: (store 4)
				early-clobber renamable $x0 = STRSpre renamable $s1, killed renamable $x0, 16 :: (store 4)
				early-clobber renamable $x0 = STRSpre renamable $s0, killed renamable $x0, 4 :: (store 4)
				early-clobber renamable $x0 = STRSpre renamable $s1, killed renamable $x0, 12 :: (store 4)
				early-clobber renamable $x0 = STRSpre renamable $s0, killed renamable $x0, 4 :: (store 4)
				early-clobber renamable $x0 = STRSpre renamable $s1, killed renamable $x0, 4 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 12-strspre-strsui-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				dmgreenUnsubmitted Done Reply Inline Actions store 4 dmgreen: store 4
				bb.0.entry:

				; The offset of the second st is not equal to the
				; size of the destination register, and hence can’t be merged.

				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 12-strspre-strsui-no-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 12, implicit $w0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x0, 2 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 12 :: (store 4)
				STRSui killed renamable $s1, renamable $x0, 2 :: (store 4)
				RET undef $lr, implicit $x0
				...


				---
				name: 13-strqpre-sturqi-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				dmgreenUnsubmitted Done Reply Inline Actions What happens if this is 16? (Or 4) dmgreen: What happens if this is 16? (Or 4)
				stelios-armAuthorUnsubmitted Done Reply Inline Actions It exhibits the same behaviour. stelios-arm: It exhibits the same behaviour.
				stelios-armAuthorUnsubmitted Done Reply Inline Actions They will not be merged. stelios-arm: They will not be merged.
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $x0

				; CHECK-LABEL: name: 13-strqpre-sturqi-merge
				; CHECK: liveins: $q0, $q1, $x0
				; CHECK: early-clobber $x0 = STPQpre renamable $q0, renamable $q1, renamable $x0, 3 :: (store 16)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRQpre killed renamable $q0, killed renamable $x0, 48 :: (store 16)
				STURQi killed renamable $q1, renamable $x0, 16 :: (store 16)
				RET undef $lr, implicit $x0

				...


				---
				name: 14-strqpre-sturqi-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$q0' }
				- { reg: '$q1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $x0
				dmgreenUnsubmitted Done Reply Inline Actions I tried to come up with a list of tests. You have most of the covered, I also came up with these, some of which might be good to make sure are covered: Given ldrqpre a, [b, c] and ldrqui d, [e, f] q with d load with store a == b? No sure what happens then b != e. That's probably tested naturally anyway. second instruction is ldruqui. There were some others but they sound less useful. dmgreen: I tried to come up with a list of tests. You have most of the covered, I also came up with…
				; CHECK-LABEL: name: 14-strqpre-sturqi-no-merge
				; CHECK: liveins: $q0, $q1, $x0
				; CHECK: early-clobber renamable $x0 = STRQpre renamable $q0, renamable $x0, 48, implicit $w0 :: (store 16)
				; CHECK: STURQi renamable $q1, renamable $x0, 1 :: (store 16)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRQpre killed renamable $q0, killed renamable $x0, 48 :: (store 16)
				STURQi killed renamable $q1, renamable $x0, 1 :: (store 16)
				RET undef $lr, implicit $x0
				...


				---
				name: 15-strspre-strsui-unaligned-no-merge
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$s0' }
				- { reg: '$s1' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				machineFunctionInfo:
				hasRedZone: false
				body: \|
				bb.0.entry:
				liveins: $s0, $s1, $x0
				; CHECK-LABEL: name: 15-strspre-strsui-unaligned-no-merge
				; CHECK: liveins: $s0, $s1, $x0
				; CHECK: early-clobber renamable $x0 = STRSpre renamable $s0, renamable $x0, 251, implicit $w0 :: (store 4)
				; CHECK: STRSui renamable $s1, renamable $x0, 1 :: (store 4)
				; CHECK: RET undef $lr, implicit $x0
				early-clobber renamable $x0 = STRSpre killed renamable $s0, killed renamable $x0, 251 :: (store 4)
				STRSui killed renamable $s1, renamable $x0, 1 :: (store 4)
				RET undef $lr, implicit $x0
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 341838

llvm/lib/Target/AArch64/AArch64InstrInfo.h

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

llvm/test/CodeGen/AArch64/arm64-memset-inline.ll

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir

llvm/test/CodeGen/AArch64/strpre-str-merge.mir

[AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.
ClosedPublic