This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64InstrInfo.cpp
1/1
AArch64LoadStoreOptimizer.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/2
ldrpre-ldr-merge.mir

Differential D152407

[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre.
ClosedPublic

Authored by chaosdefinition on Jun 7 2023, 4:18 PM.

Download Raw Diff

Details

Reviewers

stelios-arm
SjoerdMeijer
dmgreen
fhahn
mcrosier
junbuml
t.p.northover
MatzeB
zjaffal

Commits

rGbcc5b48b0f24: Reapply "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre"
rGb0093e13fcfd: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre

Summary

This patch optimizes a pair of LDRSWpre and LDRSWui (or LDURSWi) instructions into a single LDPSWpre instruction. This is a missing case in D99272.

MIR test cases in D152564 are updated to verify the optimization.

Diff Detail

Event Timeline

chaosdefinition created this revision.Jun 7 2023, 4:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 4:18 PM

Herald added subscribers: StephenFan, hiraditya, kristof.beyls. · View Herald Transcript

chaosdefinition requested review of this revision.Jun 7 2023, 4:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 4:18 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B237391: Diff 529461.Jun 7 2023, 5:30 PM

dmgreen added a reviewer: zjaffal.Jun 7 2023, 11:33 PM

fhahn added inline comments.Jun 8 2023, 4:02 AM

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
608	I think we should also have a test case with `LDURSWi` and variants where `LDRSWpre` is the second merge candidate. Preferably the tests will be submitted separately first.

chaosdefinition added inline comments.Jun 8 2023, 4:36 PM

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir
608	Thanks for reviewing! By submitting the tests first, I suppose merge cases should be no merge to avoid test failures and then updated to merge here, correct?

chaosdefinition mentioned this in D152564: [AArch64] Add tests for merging LDRSWpre-LDR pairs.Jun 9 2023, 10:30 AM

Addressing comments by @fhahn:

Add a test case of merging LDRSWpre and LDURSWi
Add test cases with LDRSWpre as the second merge candidate
Split test cases into a separate patch (see D152564)

chaosdefinition added a parent revision: D152564: [AArch64] Add tests for merging LDRSWpre-LDR pairs.Jun 9 2023, 10:41 AM

chaosdefinition marked an inline comment as done.

Harbormaster completed remote builds in B237814: Diff 530016.Jun 9 2023, 12:25 PM

Can you add a test where getMatchingNonSExtOpcode would be true, and we could turn into a ldp+sext? I'm not sure if it will actually happen at the moment with pre nodes. LDRSWpre + LDRWui and maybe LDRWpre + LDRSWui

Update the patch to be based on the updated D152564, no functionality change.

In D152407#4423993, @dmgreen wrote:

Can you add a test where getMatchingNonSExtOpcode would be true, and we could turn into a ldp+sext? I'm not sure if it will actually happen at the moment with pre nodes. LDRSWpre + LDRWui and maybe LDRWpre + LDRSWui

Addressed in D152564. Indeed these are currently not merged; perhaps fix them in the future?

Harbormaster completed remote builds in B240548: Diff 533659.Jun 22 2023, 12:50 PM

Ping...

Sorry for the delay. This fell off my radar. The changes LGTM, as far as I can see.

This revision is now accepted and ready to land.Jul 18 2023, 12:31 AM

Closed by commit rGb0093e13fcfd: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre (authored by chaosdefinition). · Explain WhyJul 18 2023, 9:47 AM

This revision was automatically updated to reflect the committed changes.

chaosdefinition mentioned this in rG94f76004d53d: [AArch64] Add tests for merging LDRSWpre-LDR pairs.

chaosdefinition added a commit: rGb0093e13fcfd: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre.

Heads-up - this caused multiple segmentation faults in our code base. We are working on a reproducer.

We see a miscompilation after this patch in protobuf under MSan. This is the line where the miscompilation happens, and I extracted it to this godbolt with compilation option -fsanitize=memory -O.

Before this patch, the AArch64 assembly is (registers are renamed a bit, because this is disassembled from a binary I built internally)

1bfbbc64: b8808d49      ldrsw   x9, [x10, #8]!
1bfbbc68: ca0d014e      eor     x14, x10, x13
1bfbbc6c: b8404d4c      ldr     w12, [x10, #4]!
1bfbbc70: ca0d014d      eor     x13, x10, x13

After, it became

1bfbb51c: 29c13149      ldp     w9, w12, [x10, #8]!
1bfbb520: 93407d4a      sxtw    x10, w10
1bfbb524: ca0d014e      eor     x14, x10, x13
1bfbb528: ca0d014d      eor     x13, x10, x13

There are two problems:

The after sequence touches x10, before calculating x13 and x14, which are supposed to be the addresses of the shadow memory. This caused the crash because our x10 was pointing to stack where the 32-th bit is 1. I believe that line should have been sxtw x9, w9 instead, to perform the extension done in ldrsw in the before sequence.
This also miscalculates x13, where in the before sequence, after execution, it holds old(x13) xor (x10 + 12), yet after the "after" sequence, it holds old(x13) xor (x10 + 8), the same as x14.

In D152407#4531841, @eaeltsin wrote:

Heads-up - this caused multiple segmentation faults in our code base. We are working on a reproducer.

In D152407#4533478, @scw wrote:
We see a miscompilation after this patch in protobuf under MSan. This is the line where the miscompilation happens, and I extracted it to this godbolt with compilation option -fsanitize=memory -O.

Before this patch, the AArch64 assembly is (registers are renamed a bit, because this is disassembled from a binary I built internally)
1bfbbc64: b8808d49      ldrsw   x9, [x10, #8]!
1bfbbc68: ca0d014e      eor     x14, x10, x13
1bfbbc6c: b8404d4c      ldr     w12, [x10, #4]!
1bfbbc70: ca0d014d      eor     x13, x10, x13
After, it became
1bfbb51c: 29c13149      ldp     w9, w12, [x10, #8]!
1bfbb520: 93407d4a      sxtw    x10, w10
1bfbb524: ca0d014e      eor     x14, x10, x13
1bfbb528: ca0d014d      eor     x13, x10, x13
There are two problems:

The after sequence touches x10, before calculating x13 and x14, which are supposed to be the addresses of the shadow memory. This caused the crash because our x10 was pointing to stack where the 32-th bit is 1. I believe that line should have been sxtw x9, w9 instead, to perform the extension done in ldrsw in the before sequence.

This also miscalculates x13, where in the before sequence, after execution, it holds old(x13) xor (x10 + 12), yet after the "after" sequence, it holds old(x13) xor (x10 + 8), the same as x14.

Thanks for reporting and reproducing the miscompilation! It looks like two pre-indexed loads LDRSWpre and LDRWpre got merged into a LDPWpre and SBFMXri, which is an unintended result by this patch. I think we can either rollback this patch or apply a quick fix that disallows merging two pre-indexed loads like the following. Any thoughts?

diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 419b471db3a3..e64b1d0658f0 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1317,20 +1317,24 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
                                   MI.getFlag(MachineInstr::FrameDestroy)))
     return false;

   unsigned OpcA = FirstMI.getOpcode();
   unsigned OpcB = MI.getOpcode();

   // Opcodes match: If the opcodes are pre ld/st there is nothing more to check.
   if (OpcA == OpcB)
     return !AArch64InstrInfo::isPreLdSt(FirstMI);

+  // Two pre ld/st of different opcodes cannot be merged either
+  if (AArch64InstrInfo::isPreLdSt(FirstMI) && AArch64InstrInfo::isPreLdSt(MI))
+    return false;
+
   // Try to match a sign-extended load/store with a zero-extended load/store.
   bool IsValidLdStrOpc, PairIsValidLdStrOpc;
   unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);
   assert(IsValidLdStrOpc &&
          "Given Opc should be a Load or Store with an immediate");
   // OpcA will be the first instruction in the pair.
   if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {
     Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);
     return true;
   }

Thanks for reporting and reproducing the miscompilation! It looks like two pre-indexed loads LDRSWpre and LDRWpre got merged into a LDPWpre and SBFMXri, which is an unintended result by this patch. I think we can either rollback this patch or apply a quick fix that disallows merging two pre-indexed loads like the following. Any thoughts?

Hi there! I'm not an expert here, but speaking with @scw and he thinks the fix doesn't address the "sxtw on wrong register bug". He'll probably be able to comment with details later but his vote is to rollback for now. Can you please do ?

Thanks

alexfh added a reverting change: D156328: Revert "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre".Jul 26 2023, 6:36 AM

In D152407#4534575, @asmok-g wrote:

Thanks for reporting and reproducing the miscompilation! It looks like two pre-indexed loads LDRSWpre and LDRWpre got merged into a LDPWpre and SBFMXri, which is an unintended result by this patch. I think we can either rollback this patch or apply a quick fix that disallows merging two pre-indexed loads like the following. Any thoughts?

Hi there! I'm not an expert here, but speaking with @scw and he thinks the fix doesn't address the "sxtw on wrong register bug". He'll probably be able to comment with details later but his vote is to rollback for now. Can you please do ?

Thanks

Sent a revert: https://reviews.llvm.org/D156328 (will commit after the precommit checks finish).

alexfh added a reverting change: rG0def4e6b0f63: Revert "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre".Jul 26 2023, 7:22 AM

chaosdefinition reopened this revision.Jul 26 2023, 2:25 PM

chaosdefinition added inline comments.

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1334	In D152407#4534575, @asmok-g wrote: Hi there! I'm not an expert here, but speaking with @scw and he thinks the fix doesn't address the "sxtw on wrong register bug". He'll probably be able to comment with details later but his vote is to rollback for now. Can you please do ? Thanks The wrong register in `SBFMXri` was resulted from `SExtIdx` being set to 0 here. However, to reach here: `OpcA` and `OpcB` must be different; `OpcA` and `OpcB` must have the same `NonSExtOpcode`. So this code path works only for pairs of `LDRSWui`-`LDRWui` and `LDURSWi`-`LDURWi` before. Now with this patch, the only added case to reach here is `LDRSWpre`-`LDRWpre` pairs, as was caught by the miscompilation. So my quick fix should prevent such pairs from generating `SBFMXri`, by bailing out early on two pre-indexed loads. I guess there might be a better way to fix this. Any thoughts while I work on an updated patch? Thanks!

This revision is now accepted and ready to land.Jul 26 2023, 2:25 PM

Reverting in the sort term does sounds like a good idea to make sure LLVM 17 is in a good shape. We can then work on getting this recommitted with the fix.

Hi, I'm reopening this with the following updates:

Apply the quick fix described in D152407#4534022 that disallows merging two pre-indexed loads, assuming there's no better approach.
Add MIR regression tests modified from @scw's reproduction .

Thanks!

Harbormaster completed remote builds in B255037: Diff 553674.Aug 25 2023, 8:26 PM

In D152407#4619082, @chaosdefinition wrote:

Hi, I'm reopening this with the following updates:

Apply the quick fix described in D152407#4534022 that disallows merging two pre-indexed loads, assuming there's no better approach.

Add MIR regression tests modified from @scw's reproduction .

Thanks!

Test case LGTM, I don't have enough knowledge for the code to know if the proposed fix is the best way though.

Also, should this change reuse the same D number or should we create a new one?

In D152407#4625772, @scw wrote:

Test case LGTM, I don't have enough knowledge for the code to know if the proposed fix is the best way though.

Also, should this change reuse the same D number or should we create a new one?

I don't know what the standard way for this is. I'm inclined to reuse the D number so to keep the discussion in this thread, but opening up a new thread is fine as well.

The same revision should be OK, if the change is just a fix to the existing patch.

I had a look through the LoadStoreOptimizer. If you don't think there are any other cases this can go wrong them it LGTM.

In D152407#4642721, @dmgreen wrote:

The same revision should be OK, if the change is just a fix to the existing patch.

I had a look through the LoadStoreOptimizer. If you don't think there are any other cases this can go wrong them it LGTM.

Thanks. I don't see any other cases that can go wrong with this. Will commit shortly.

Closed by commit rGbcc5b48b0f24: Reapply "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre" (authored by chaosdefinition). · Explain WhySep 22 2023, 9:09 PM

This revision was automatically updated to reflect the committed changes.

chaosdefinition added a commit: rGbcc5b48b0f24: Reapply "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre".

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

7 lines

AArch64LoadStoreOptimizer.cpp

8 lines

test/

CodeGen/

AArch64/

ldrpre-ldr-merge.mir

14 lines

Diff 533659

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,222 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::hasUnscaledLdStOffset(unsigned Opc) {
case AArch64::LDURDi:		case AArch64::LDURDi:
case AArch64::LDRDpre:		case AArch64::LDRDpre:
case AArch64::LDURQi:		case AArch64::LDURQi:
case AArch64::LDRQpre:		case AArch64::LDRQpre:
case AArch64::LDURWi:		case AArch64::LDURWi:
case AArch64::LDRWpre:		case AArch64::LDRWpre:
case AArch64::LDURXi:		case AArch64::LDURXi:
case AArch64::LDRXpre:		case AArch64::LDRXpre:
		case AArch64::LDRSWpre:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
case AArch64::LDURHHi:		case AArch64::LDURHHi:
case AArch64::LDURBBi:		case AArch64::LDURBBi:
case AArch64::LDURSBWi:		case AArch64::LDURSBWi:
case AArch64::LDURSHWi:		case AArch64::LDURSHWi:
return true;		return true;
}		}
}		}
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::isPairableLdStInst(const MachineInstr &MI) {
case AArch64::LDRDpre:		case AArch64::LDRDpre:
case AArch64::LDURQi:		case AArch64::LDURQi:
case AArch64::LDRQpre:		case AArch64::LDRQpre:
case AArch64::LDURWi:		case AArch64::LDURWi:
case AArch64::LDRWpre:		case AArch64::LDRWpre:
case AArch64::LDURXi:		case AArch64::LDURXi:
case AArch64::LDRXpre:		case AArch64::LDRXpre:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
		case AArch64::LDRSWpre:
return true;		return true;
}		}
}		}

unsigned AArch64InstrInfo::convertToFlagSettingOpc(unsigned Opc) {		unsigned AArch64InstrInfo::convertToFlagSettingOpc(unsigned Opc) {
switch (Opc) {		switch (Opc) {
default:		default:
llvm_unreachable("Opcode has no flag setting equivalent!");		llvm_unreachable("Opcode has no flag setting equivalent!");
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::isCandidateToMergeOrPair(const MachineInstr &MI) const {
bool IsImmPreLdSt = IsPreLdSt && MI.getOperand(3).isImm();		bool IsImmPreLdSt = IsPreLdSt && MI.getOperand(3).isImm();

if (!MI.getOperand(2).isImm() && !IsImmPreLdSt)		if (!MI.getOperand(2).isImm() && !IsImmPreLdSt)
return false;		return false;

// Can't merge/pair if the instruction modifies the base register.		// Can't merge/pair if the instruction modifies the base register.
// e.g., ldr x0, [x0]		// e.g., ldr x0, [x0]
// This case will never occur with an FI base.		// This case will never occur with an FI base.
// However, if the instruction is an LDR/STR<S,D,Q,W,X>pre, it can be merged.		// However, if the instruction is an LDR<S,D,Q,W,X,SW>pre or
		// STR<S,D,Q,W,X>pre, it can be merged.
// For example:		// For example:
// ldr q0, [x11, #32]!		// ldr q0, [x11, #32]!
// ldr q1, [x11, #16]		// ldr q1, [x11, #16]
// to		// to
// ldp q0, q1, [x11, #32]!		// ldp q0, q1, [x11, #32]!
if (MI.getOperand(1).isReg() && !IsPreLdSt) {		if (MI.getOperand(1).isReg() && !IsPreLdSt) {
Register BaseReg = MI.getOperand(1).getReg();		Register BaseReg = MI.getOperand(1).getReg();
const TargetRegisterInfo *TRI = &getRegisterInfo();		const TargetRegisterInfo *TRI = &getRegisterInfo();
▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	int AArch64InstrInfo::getMemScale(unsigned Opc) {
case AArch64::STRHHui:		case AArch64::STRHHui:
case AArch64::STURHHi:		case AArch64::STURHHi:
return 2;		return 2;
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
case AArch64::LDRSpre:		case AArch64::LDRSpre:
case AArch64::LDRSWui:		case AArch64::LDRSWui:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
		case AArch64::LDRSWpre:
case AArch64::LDRWpre:		case AArch64::LDRWpre:
case AArch64::LDRWui:		case AArch64::LDRWui:
case AArch64::LDURWi:		case AArch64::LDURWi:
case AArch64::STRSui:		case AArch64::STRSui:
case AArch64::STURSi:		case AArch64::STURSi:
case AArch64::STRSpre:		case AArch64::STRSpre:
case AArch64::STRWui:		case AArch64::STRWui:
case AArch64::STURWi:		case AArch64::STURWi:
Show All 39 Lines
}		}

bool AArch64InstrInfo::isPreLd(const MachineInstr &MI) {		bool AArch64InstrInfo::isPreLd(const MachineInstr &MI) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
return false;		return false;
case AArch64::LDRWpre:		case AArch64::LDRWpre:
case AArch64::LDRXpre:		case AArch64::LDRXpre:
		case AArch64::LDRSWpre:
case AArch64::LDRSpre:		case AArch64::LDRSpre:
case AArch64::LDRDpre:		case AArch64::LDRDpre:
case AArch64::LDRQpre:		case AArch64::LDRQpre:
return true;		return true;
}		}
}		}

bool AArch64InstrInfo::isPreSt(const MachineInstr &MI) {		bool AArch64InstrInfo::isPreSt(const MachineInstr &MI) {
▲ Show 20 Lines • Show All 5,219 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	static unsigned getMatchingNonSExtOpcode(unsigned Opc,
case AArch64::LDRSui:		case AArch64::LDRSui:
case AArch64::LDURSi:		case AArch64::LDURSi:
case AArch64::LDRSpre:		case AArch64::LDRSpre:
return Opc;		return Opc;
case AArch64::LDRSWui:		case AArch64::LDRSWui:
return AArch64::LDRWui;		return AArch64::LDRWui;
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return AArch64::LDURWi;		return AArch64::LDURWi;
		case AArch64::LDRSWpre:
		return AArch64::LDRWpre;
}		}
}		}

static unsigned getMatchingWideOpcode(unsigned Opc) {		static unsigned getMatchingWideOpcode(unsigned Opc) {
switch (Opc) {		switch (Opc) {
default:		default:
llvm_unreachable("Opcode has no wide equivalent!");		llvm_unreachable("Opcode has no wide equivalent!");
case AArch64::STRBBui:		case AArch64::STRBBui:
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static unsigned getMatchingPairOpcode(unsigned Opc) {
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDURXi:		case AArch64::LDURXi:
return AArch64::LDPXi;		return AArch64::LDPXi;
case AArch64::LDRXpre:		case AArch64::LDRXpre:
return AArch64::LDPXpre;		return AArch64::LDPXpre;
case AArch64::LDRSWui:		case AArch64::LDRSWui:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return AArch64::LDPSWi;		return AArch64::LDPSWi;
		case AArch64::LDRSWpre:
		return AArch64::LDPSWpre;
}		}
}		}

static unsigned isMatchingStore(MachineInstr &LoadInst,		static unsigned isMatchingStore(MachineInstr &LoadInst,
MachineInstr &StoreInst) {		MachineInstr &StoreInst) {
unsigned LdOpc = LoadInst.getOpcode();		unsigned LdOpc = LoadInst.getOpcode();
unsigned StOpc = StoreInst.getOpcode();		unsigned StOpc = StoreInst.getOpcode();
switch (LdOpc) {		switch (LdOpc) {
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	static bool isPreLdStPairCandidate(MachineInstr &FirstMI, MachineInstr &MI) {
case AArch64::LDRDpre:		case AArch64::LDRDpre:
return (OpcB == AArch64::LDRDui) \|\| (OpcB == AArch64::LDURDi);		return (OpcB == AArch64::LDRDui) \|\| (OpcB == AArch64::LDURDi);
case AArch64::LDRQpre:		case AArch64::LDRQpre:
return (OpcB == AArch64::LDRQui) \|\| (OpcB == AArch64::LDURQi);		return (OpcB == AArch64::LDRQui) \|\| (OpcB == AArch64::LDURQi);
case AArch64::LDRWpre:		case AArch64::LDRWpre:
return (OpcB == AArch64::LDRWui) \|\| (OpcB == AArch64::LDURWi);		return (OpcB == AArch64::LDRWui) \|\| (OpcB == AArch64::LDURWi);
case AArch64::LDRXpre:		case AArch64::LDRXpre:
return (OpcB == AArch64::LDRXui) \|\| (OpcB == AArch64::LDURXi);		return (OpcB == AArch64::LDRXui) \|\| (OpcB == AArch64::LDURXi);
		case AArch64::LDRSWpre:
		return (OpcB == AArch64::LDRSWui) \|\| (OpcB == AArch64::LDURSWi);
}		}
}		}

// Returns the scale and offset range of pre/post indexed variants of MI.		// Returns the scale and offset range of pre/post indexed variants of MI.
static void getPrePostIndexedMemOpInfo(const MachineInstr &MI, int &Scale,		static void getPrePostIndexedMemOpInfo(const MachineInstr &MI, int &Scale,
int &MinOffset, int &MaxOffset) {		int &MinOffset, int &MaxOffset) {
bool IsPaired = AArch64InstrInfo::isPairedLdSt(MI);		bool IsPaired = AArch64InstrInfo::isPairedLdSt(MI);
bool IsTagStore = isTagStore(MI);		bool IsTagStore = isTagStore(MI);
▲ Show 20 Lines • Show All 724 Lines • ▼ Show 20 Lines	static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,

// Try to match a sign-extended load/store with a zero-extended load/store.		// Try to match a sign-extended load/store with a zero-extended load/store.
bool IsValidLdStrOpc, PairIsValidLdStrOpc;		bool IsValidLdStrOpc, PairIsValidLdStrOpc;
unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);		unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);
assert(IsValidLdStrOpc &&		assert(IsValidLdStrOpc &&
"Given Opc should be a Load or Store with an immediate");		"Given Opc should be a Load or Store with an immediate");
// OpcA will be the first instruction in the pair.		// OpcA will be the first instruction in the pair.
if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {		if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {
Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);		Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);
		chaosdefinitionAuthorUnsubmitted Done Reply Inline Actions In D152407#4534575, @asmok-g wrote: Hi there! I'm not an expert here, but speaking with @scw and he thinks the fix doesn't address the "sxtw on wrong register bug". He'll probably be able to comment with details later but his vote is to rollback for now. Can you please do ? Thanks The wrong register in `SBFMXri` was resulted from `SExtIdx` being set to 0 here. However, to reach here: `OpcA` and `OpcB` must be different; `OpcA` and `OpcB` must have the same `NonSExtOpcode`. So this code path works only for pairs of `LDRSWui`-`LDRWui` and `LDURSWi`-`LDURWi` before. Now with this patch, the only added case to reach here is `LDRSWpre`-`LDRWpre` pairs, as was caught by the miscompilation. So my quick fix should prevent such pairs from generating `SBFMXri`, by bailing out early on two pre-indexed loads. I guess there might be a better way to fix this. Any thoughts while I work on an updated patch? Thanks! chaosdefinition: >>! In D152407#4534575, @asmok-g wrote: > > Hi there! I'm not an expert here, but speaking…
return true;		return true;
}		}

// If the second instruction isn't even a mergable/pairable load/store, bail		// If the second instruction isn't even a mergable/pairable load/store, bail
// out.		// out.
if (!PairIsValidLdStrOpc)		if (!PairIsValidLdStrOpc)
return false;		return false;

// FIXME: We don't support merging narrow stores with mixed scaled/unscaled		// FIXME: We don't support merging narrow stores with mixed scaled/unscaled
// offsets.		// offsets.
if (isNarrowStore(OpcA) \|\| isNarrowStore(OpcB))		if (isNarrowStore(OpcA) \|\| isNarrowStore(OpcB))
return false;		return false;

// The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and		// The STR<S,D,Q,W,X>pre - STR<S,D,Q,W,X>ui and
// LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui		// LDR<S,D,Q,W,X,SW>pre-LDR<S,D,Q,W,X,SW>ui
// are candidate pairs that can be merged.		// are candidate pairs that can be merged.
if (isPreLdStPairCandidate(FirstMI, MI))		if (isPreLdStPairCandidate(FirstMI, MI))
return true;		return true;

// Try to match an unscaled load/store with a scaled load/store.		// Try to match an unscaled load/store with a scaled load/store.
return TII->hasUnscaledLdStOffset(OpcA) != TII->hasUnscaledLdStOffset(OpcB) &&		return TII->hasUnscaledLdStOffset(OpcA) != TII->hasUnscaledLdStOffset(OpcB) &&
getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB);		getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB);

▲ Show 20 Lines • Show All 998 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir

Show First 20 Lines • Show All 582 Lines • ▼ Show 20 Lines	bb.0:
renamable $s1 = LDRSui renamable $x1, 1 :: (load (s32))		renamable $s1 = LDRSui renamable $x1, 1 :: (load (s32))
STRSui killed renamable $s0, renamable $x1, 0 :: (store (s32))		STRSui killed renamable $s0, renamable $x1, 0 :: (store (s32))
STRSui killed renamable $s1, renamable $x1, 1 :: (store (s32))		STRSui killed renamable $s1, renamable $x1, 1 :: (store (s32))
RET undef $lr		RET undef $lr
...		...


---		---
name: 21-ldrswpre-ldrswui-no-merge		name: 21-ldrswpre-ldrswui-merge
tracksRegLiveness: true		tracksRegLiveness: true
liveins:		liveins:
- { reg: '$x0' }		- { reg: '$x0' }
- { reg: '$x1' }		- { reg: '$x1' }
- { reg: '$x2' }		- { reg: '$x2' }
machineFunctionInfo:		machineFunctionInfo:
hasRedZone: false		hasRedZone: false
body: \|		body: \|
bb.0:		bb.0:
liveins: $x0, $x1, $x2		liveins: $x0, $x1, $x2
; CHECK-LABEL: name: 21-ldrswpre-ldrswui-no-merge		; CHECK-LABEL: name: 21-ldrswpre-ldrswui-merge
; CHECK: liveins: $x0, $x1, $x2		; CHECK: liveins: $x0, $x1, $x2
; CHECK: early-clobber renamable $x1, renamable $x0 = LDRSWpre renamable $x1, 40, implicit $w1 :: (load (s32))		; CHECK: early-clobber $x1, renamable $x0, renamable $x2 = LDPSWpre renamable $x1, 10 :: (load (s32))
; CHECK: renamable $x2 = LDRSWui renamable $x1, 1 :: (load (s32))
; CHECK: STPXi renamable $x0, renamable $x2, renamable $x1, 0 :: (store (s64))		; CHECK: STPXi renamable $x0, renamable $x2, renamable $x1, 0 :: (store (s64))
; CHECK: RET undef $lr		; CHECK: RET undef $lr
early-clobber renamable $x1, renamable $x0 = LDRSWpre killed renamable $x1, 40 :: (load (s32))		early-clobber renamable $x1, renamable $x0 = LDRSWpre killed renamable $x1, 40 :: (load (s32))
renamable $x2 = LDRSWui renamable $x1, 1 :: (load (s32))		renamable $x2 = LDRSWui renamable $x1, 1 :: (load (s32))
		fhahnUnsubmitted Done Reply Inline Actions I think we should also have a test case with `LDURSWi` and variants where `LDRSWpre` is the second merge candidate. Preferably the tests will be submitted separately first. fhahn: I think we should also have a test case with `LDURSWi` and variants where `LDRSWpre` is the…
		chaosdefinitionAuthorUnsubmitted Done Reply Inline Actions Thanks for reviewing! By submitting the tests first, I suppose merge cases should be no merge to avoid test failures and then updated to merge here, correct? chaosdefinition: Thanks for reviewing! By submitting the tests first, I suppose merge cases should be no merge…
STRXui killed renamable $x0, renamable $x1, 0 :: (store (s64))		STRXui killed renamable $x0, renamable $x1, 0 :: (store (s64))
STRXui killed renamable $x2, renamable $x1, 1 :: (store (s64))		STRXui killed renamable $x2, renamable $x1, 1 :: (store (s64))
RET undef $lr		RET undef $lr
...		...


---		---
name: 22-ldrswpre-ldurswi-no-merge		name: 22-ldrswpre-ldurswi-merge
tracksRegLiveness: true		tracksRegLiveness: true
liveins:		liveins:
- { reg: '$x0' }		- { reg: '$x0' }
- { reg: '$x1' }		- { reg: '$x1' }
- { reg: '$x2' }		- { reg: '$x2' }
machineFunctionInfo:		machineFunctionInfo:
hasRedZone: false		hasRedZone: false
body: \|		body: \|
bb.0:		bb.0:
liveins: $x0, $x1, $x2		liveins: $x0, $x1, $x2
; CHECK-LABEL: name: 22-ldrswpre-ldurswi-no-merge		; CHECK-LABEL: name: 22-ldrswpre-ldurswi-merge
; CHECK: liveins: $x0, $x1, $x2		; CHECK: liveins: $x0, $x1, $x2
; CHECK: early-clobber renamable $x1, renamable $x0 = LDRSWpre renamable $x1, 40, implicit $w1 :: (load (s32))		; CHECK: early-clobber $x1, renamable $x0, renamable $x2 = LDPSWpre renamable $x1, 10 :: (load (s32))
; CHECK: renamable $x2 = LDURSWi renamable $x1, 4 :: (load (s32))
; CHECK: STPXi renamable $x0, renamable $x2, renamable $x1, 0 :: (store (s64))		; CHECK: STPXi renamable $x0, renamable $x2, renamable $x1, 0 :: (store (s64))
; CHECK: RET undef $lr		; CHECK: RET undef $lr
early-clobber renamable $x1, renamable $x0 = LDRSWpre killed renamable $x1, 40 :: (load (s32))		early-clobber renamable $x1, renamable $x0 = LDRSWpre killed renamable $x1, 40 :: (load (s32))
renamable $x2 = LDURSWi renamable $x1, 4 :: (load (s32))		renamable $x2 = LDURSWi renamable $x1, 4 :: (load (s32))
STRXui killed renamable $x0, renamable $x1, 0 :: (store (s64))		STRXui killed renamable $x0, renamable $x1, 0 :: (store (s64))
STRXui killed renamable $x2, renamable $x1, 1 :: (store (s64))		STRXui killed renamable $x2, renamable $x1, 1 :: (store (s64))
RET undef $lr		RET undef $lr
...		...
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines