This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMExpandPseudoInsts.cpp
5
ARMISelDAGToDAG.cpp
-
ARMISelLowering.h
2/10
ARMISelLowering.cpp
1
ARMInstrInfo.td
-
ARMInstrThumb2.td
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
i64_volatile_load_store.ll

Differential D70072

[ARM] Improve codegen of volatile load/store of i64
ClosedPublic

Authored by vhscampos on Nov 11 2019, 3:53 AM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
john.brawn
nickdesaulniers

Commits

rGc010d4d19550: [ARM] Improve codegen of volatile load/store of i64
rG8a1255322318: [ARM] Improve codegen of volatile load/store of i64
rG60e0120c913d: [ARM] Improve codegen of volatile load/store of i64
rGbbcf1c3496ce: [ARM] Improve codegen of volatile load/store of i64

Summary

Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vhscampos created this revision.Nov 11 2019, 3:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2019, 3:53 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B40736: Diff 228663.Nov 11 2019, 3:58 AM

vhscampos added a reviewer: dmgreen.Nov 11 2019, 4:03 AM

vhscampos added reviewers: efriedma, john.brawn.Nov 11 2019, 5:13 AM

The architecture specification provides limited guarantees here, but I guess this doesn't do any harm.

llvm/lib/Target/ARM/ARMISelLowering.cpp
9166	I'd prefer not to exclude extending loads here. Could lead to weird cases where we miss the transform.
9180	Please don't make MachineSDNodes this early; it might appear to mostly work, but other code is not expecting MachineSDNodes at this point. See ARMISelLowering.h for how to introduce a target-specific SDNode.

Not exclude extloads anymore.
Create new ARMISD nodes specific to load/store of dual registers.
Custom lower i64 volatile loads/stores to these new ARMISD nodes.

Harbormaster completed remote builds in B40875: Diff 229043.Nov 13 2019, 2:34 AM

efriedma added inline comments.Nov 18 2019, 5:52 PM

llvm/lib/Target/ARM/ARMISelLowering.cpp
9225	Loads and stores should not have an "Offset" at this point; we don't form pre/post-indexed operations until after legalization. Maybe worth asserting "isUnindexed()". (Same for loads.)

efriedma added inline comments.Nov 18 2019, 5:52 PM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
3929	You should be able to use TableGen patterns for these, I think. AArch64Prefetch is an example of something similar to what you want.
llvm/lib/Target/ARM/ARMISelLowering.cpp
9166	You still need to handle extending loads somehow... Actually, maybe we don't mess with volatile loads in DAGCombine, and you don't need to implement it. In that case, it would still be nice to have an assertion, in case someone changes it at some point.
9213	We also don't want to restrict truncating stores.

dmgreen added inline comments.Nov 19 2019, 4:16 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
9227	Likewise, this can just say if (Subtarget->hasMVEIntegerOps() && (VT == MVT::v4i1 \|\| VT == MVT::v8i1 \|\| VT == MVT::v16i1)) The other cases (isTruncatingStore and isUnindexed) should not come up. If they do we should noisily fail with an assert.
9473	How come this is altered, but not the LowerPredicateLoad?

vhscampos marked an inline comment as done.Nov 19 2019, 5:16 AM

vhscampos added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
9473	The custom lowering of loads and stores here is triggered by the DAG Type Legalizer, since i64 is not supported. In DAGTypeLegalizer::CustomLowerNode(), custom lowering of loads is directed to ARMTargetLowering::ReplaceNodeResults(), which then calls LowerLOAD(), created in the present patch. The lowering of stores is the one that is directed to ARMTargetLowering::LowerOperation(). In summary, the custom lowering of loads because of illegal result types does not go through here, so I believe there's no need to have it changed in this point.

Move the custom SD nodes to TableGen.
Truncating stores not restricted anymore.
Remove isUnindexed() calls since they should return true always at this point.
Extend test to also check loads/stores that have an offset.

vhscampos marked an inline comment as done.Nov 20 2019, 8:49 AM

vhscampos added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
9166	Your last comment on this wasn't clear to me. If after the latest change you still want me to add anything, please let me know.

Harbormaster completed remote builds in B41248: Diff 230276.Nov 20 2019, 8:53 AM

efriedma added inline comments.Nov 21 2019, 4:34 PM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
1322	Are you sure this range computation is right? The range should be multiples of 4 from -1020 to 1020.

Fixed the immediate range check to cover -1020 to 1020.

Harbormaster completed remote builds in B41363: Diff 230622.Nov 22 2019, 3:22 AM

dmgreen added inline comments.Nov 27 2019, 7:59 AM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
1322	It may be worth adding tests for cases that are close to or just over the boundary.

Update summary to have a better explanation of this patch.
Add a post-ISel hook to add register allocation hints to LDRD/STRD operands.
In the AArch32 case, move ISel from TableGen back to the C++ side. This is needed because we must have a custom lowering whenever LDRD/STRD selection would normally yield a register offset. The ARM Load/Store Optimizer is not able to handle LDRD/STRD's register offsets in the cases where LDRD/STRD must be reverted to LDM/STM. As such, the C++ instruction selection opts for not generating instructions with a register offset.
Improve test by testing several immediate boundary cases.

vhscampos edited the summary of this revision. (Show Details)Dec 11 2019, 5:48 AM

Harbormaster completed remote builds in B42303: Diff 233337.Dec 11 2019, 5:50 AM

efriedma added inline comments.Dec 12 2019, 10:28 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
10977	Oh, I didn't notice this before. There's a problem here: if we're trying to generate LDRD for the sake of the extra guarantees provided by the some targets, transforming that to LDM is wrong; it doesn't have the same guarantee, and therefore could cause unpredictable, subtle problems. Probably we want to allocate a GPRPair, instead of allocating two separate registers and trying to tie them together with a hint. Maybe requires defining a new pseudo-instruction that takes a GPRPair instead of two GPRs. Or I'd be okay with just restricting the optimization to Thumb2 for now, if you don't want to do the extra work right now.

vhscampos edited the summary of this revision. (Show Details)Dec 16 2019, 6:29 AM

Create ARM PseudoInsts that take a register pair operand. This way we can enforce the allocation requirement.

Harbormaster completed remote builds in B42547: Diff 234045.Dec 16 2019, 6:33 AM

efriedma added inline comments.Dec 17 2019, 4:33 PM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
3718	I think you could implement this pattern in TableGen (grep for REG_SEQUENCE in the ARM .td files). But probably not the corresponding load pattern, since the inverse opcode doesn't exist, so not a big deal either way.
llvm/lib/Target/ARM/ARMInstrInfo.td
2745	hasExtraDefRegAllocReq shouldn't be necessary; the regular ldrd/strd are weird because the register allocation constraint isn't expressed correctly by the operand types, but you don't have that problem here.

Changed STOREDUAL instruction selection to use TableGen patterns.
Removed one constraint from LOADDUAL and STOREDUAL.

Harbormaster completed remote builds in B42718: Diff 234493.Dec 18 2019, 2:49 AM

LGTM

This revision is now accepted and ready to land.Dec 18 2019, 2:26 PM

Closed by commit rGbbcf1c3496ce: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyDec 19 2019, 3:24 AM

This revision was automatically updated to reflect the committed changes.

Hi Victor,

this commit breaks "LLVM::2010-05-03-OriginDIE.ll" test on "llvm-clang-win-x-armv7l" builder.
Here is the failed build http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/1851

Would you take a look?

Thanks,
Vlad.

I created a new patch fixing the issue: https://reviews.llvm.org/D71749

Hi Victor, we're observing a crash in Clang when compiling the Linux kernel bisected to this commit. Can you please revert?
See the trace in https://ci.linaro.org/job/tcwg_kernel-bisect-llvm-master-arm-next-allmodconfig/62/artifact/artifacts/build-bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce/console.log grep for stack dump.

I have reverted this commit.

Also, this revision was reopened so that I can append its fix here for reviewing.

This revision is now accepted and ready to land.Dec 20 2019, 10:13 AM

This is the same patch as before but with fixes for the tests that regressed (as reported in the comments here).

The fix is to specify the address mode of the two pseudo instructions introduced.

Harbormaster completed remote builds in B43154: Diff 235844.Jan 2 2020, 3:30 AM

Thanks @vhscampos , I tested the new patch and could no longer reproduce the observed crashes. I'm not sure if I'm looking at the interdiff correctly, but consider adding additional test cases that properly describe and cover the breakage we observed.

Added one test to cover the issue related to loads/stores to a stack frame.

Harbormaster completed remote builds in B43418: Diff 236557.Jan 7 2020, 5:14 AM

Closed by commit rG60e0120c913d: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyJan 7 2020, 5:23 AM

This revision was automatically updated to reflect the committed changes.

@vhscampos sorry, we're getting new/different warnings now seemingly with this patch: https://github.com/ClangBuiltLinux/linux/issues/838

Warning: index register overlaps transfer register

nathanchance added a subscriber: nathanchance.Jan 16 2020, 11:29 AM

Apparently the ARM-mode LDRD is a bit more strange than I realized. From the ARM manual: if t2 == 15 || m == 15 || m == t || m == t2 then UNPREDICTABLE;. I guess we're managed to avoid running into this in the past by never generating the register form of ldrd.

It should be possible to express this constraint to the register allocator using @earlyclobber. (@earlyclobber is actually a little more conservative than we need, strictly speaking, but the difference probably doesn't matter too much.)

@nickdesaulniers @efriedma @nathanchance Apologies for missing the latest comments! Since this seems to be a blocker, I'd suggest that this change gets reverted until I am able to have a closer look at the issue of register overlapping. What do you think?

Please revert; I am happy to test a new revision to make sure there are no warnings but I don’t want this shipped in clang-10 and a revert is something that we can easily backport unless you can come up with a fix rather quickly.

Reverted in the master branch.

vhscampos reopened this revision.Feb 8 2020, 5:27 AM

This revision is now accepted and ready to land.Feb 8 2020, 5:27 AM

vhscampos planned changes to this revision.Feb 8 2020, 5:28 AM

In D70072#1865561, @vhscampos wrote:

Reverted in the master branch.

I see it was also reverted on the release/10.x branch 7996b49053f0508717f4a081d197ddc3073f4b5f. Thanks for keeping the branch in mind, but as mentioned in http://lists.llvm.org/pipermail/llvm-dev/2020-January/138295.html please check with me before pushing directly to it.

alanphipps added a subscriber: alanphipps.Feb 10 2020, 2:49 PM

Changes:

Explictly avoid using the register-offset variant of LDRD/STRD. This variant has a limitation on register allocation: the register allocated to the register-offset cannot be reused in any of the remaining operands. I could not find an easy way to implement this in LLVM, so I left it as a to-do in the future.
Instruction selection of STRD was moved from TableGen to C++ because of point (1).
Updated tests to reflect these changes.

This revision is now accepted and ready to land.Mar 10 2020, 10:09 AM

Herald added a subscriber: danielkiss. · View Herald TranscriptMar 10 2020, 10:09 AM

Can I please have this reviewed again? I have addressed the issues reported.

Harbormaster failed remote builds in B48711: Diff 249434!Mar 10 2020, 10:53 AM

Boot tested a clang built arm32 linux kernel in QEMU with this patch applied. Thanks for following up with fixes.

This revision is now accepted and ready to land.Mar 10 2020, 1:05 PM

LGTM with one minor comment.

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
3694	The RegOffset check could use a comment explaining what it's doing, here and for STRD.

Added comment to explain the fallback to non-register-offset variants.

Closed by commit rG8a1255322318: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyMar 11 2020, 3:47 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B48792: Diff 249573!Mar 11 2020, 4:30 AM

vhscampos reopened this revision.May 27 2020, 7:56 AM

This revision is now accepted and ready to land.May 27 2020, 7:56 AM

Improve the testcase which exercises loads and stores from stack. Now, wrong frame index replacements will be caught here.

vhscampos requested review of this revision.May 27 2020, 8:01 AM

Harbormaster failed remote builds in B58058: Diff 266545!May 27 2020, 9:11 AM

LGTM

This revision is now accepted and ready to land.May 27 2020, 4:10 PM

Closed by commit rGc010d4d19550: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyMay 28 2020, 3:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMExpandPseudoInsts.cpp

18 lines

82 lines

8 lines

62 lines

22 lines

9 lines

test/

CodeGen/

ARM/

i64_volatile_load_store.ll

191 lines

Diff 266796

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp

Show First 20 Lines • Show All 2,729 Lines • ▼ Show 20 Lines	case ARM::BL_PUSHLR: {
// bl __gnu_mcount_nc		// bl __gnu_mcount_nc
MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::BL));		MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::BL));
}		}
MIB.cloneMemRefs(MI);		MIB.cloneMemRefs(MI);
for (unsigned i = 1; i < MI.getNumOperands(); ++i) MIB.add(MI.getOperand(i));		for (unsigned i = 1; i < MI.getNumOperands(); ++i) MIB.add(MI.getOperand(i));
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}
		case ARM::LOADDUAL:
		case ARM::STOREDUAL: {
		Register PairReg = MI.getOperand(0).getReg();

		MachineInstrBuilder MIB =
		BuildMI(MBB, MBBI, MI.getDebugLoc(),
		TII->get(Opcode == ARM::LOADDUAL ? ARM::LDRD : ARM::STRD))
		.addReg(TRI->getSubReg(PairReg, ARM::gsub_0),
		Opcode == ARM::LOADDUAL ? RegState::Define : 0)
		.addReg(TRI->getSubReg(PairReg, ARM::gsub_1),
		Opcode == ARM::LOADDUAL ? RegState::Define : 0);
		for (unsigned i = 1; i < MI.getNumOperands(); i++)
		MIB.add(MI.getOperand(i));
		MIB.add(predOps(ARMCC::AL));
		MIB.cloneMemRefs(MI);
		MI.eraseFromParent();
		return true;
		}
}		}
}		}

bool ARMExpandPseudo::ExpandMBB(MachineBasicBlock &MBB) {		bool ARMExpandPseudo::ExpandMBB(MachineBasicBlock &MBB) {
bool Modified = false;		bool Modified = false;

MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();		MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
while (MBBI != E) {		while (MBBI != E) {
Show All 32 Lines

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	public:
bool SelectThumbAddrModeImm5S4(SDValue N, SDValue &Base,		bool SelectThumbAddrModeImm5S4(SDValue N, SDValue &Base,
SDValue &OffImm);		SDValue &OffImm);
bool SelectThumbAddrModeSP(SDValue N, SDValue &Base, SDValue &OffImm);		bool SelectThumbAddrModeSP(SDValue N, SDValue &Base, SDValue &OffImm);
template <unsigned Shift>		template <unsigned Shift>
bool SelectTAddrModeImm7(SDValue N, SDValue &Base, SDValue &OffImm);		bool SelectTAddrModeImm7(SDValue N, SDValue &Base, SDValue &OffImm);

// Thumb 2 Addressing Modes:		// Thumb 2 Addressing Modes:
bool SelectT2AddrModeImm12(SDValue N, SDValue &Base, SDValue &OffImm);		bool SelectT2AddrModeImm12(SDValue N, SDValue &Base, SDValue &OffImm);
		template <unsigned Shift>
		bool SelectT2AddrModeImm8(SDValue N, SDValue &Base, SDValue &OffImm);
bool SelectT2AddrModeImm8(SDValue N, SDValue &Base,		bool SelectT2AddrModeImm8(SDValue N, SDValue &Base,
SDValue &OffImm);		SDValue &OffImm);
bool SelectT2AddrModeImm8Offset(SDNode *Op, SDValue N,		bool SelectT2AddrModeImm8Offset(SDNode *Op, SDValue N,
SDValue &OffImm);		SDValue &OffImm);
template <unsigned Shift>		template <unsigned Shift>
bool SelectT2AddrModeImm7Offset(SDNode *Op, SDValue N, SDValue &OffImm);		bool SelectT2AddrModeImm7Offset(SDNode *Op, SDValue N, SDValue &OffImm);
bool SelectT2AddrModeImm7Offset(SDNode *Op, SDValue N, SDValue &OffImm,		bool SelectT2AddrModeImm7Offset(SDNode *Op, SDValue N, SDValue &OffImm,
unsigned Shift);		unsigned Shift);
▲ Show 20 Lines • Show All 1,151 Lines • ▼ Show 20 Lines	bool ARMDAGToDAGISel::SelectT2AddrModeImm12(SDValue N,
}		}

// Base only.		// Base only.
Base = N;		Base = N;
OffImm = CurDAG->getTargetConstant(0, SDLoc(N), MVT::i32);		OffImm = CurDAG->getTargetConstant(0, SDLoc(N), MVT::i32);
return true;		return true;
}		}

		template <unsigned Shift>
		bool ARMDAGToDAGISel::SelectT2AddrModeImm8(SDValue N, SDValue &Base,
		SDValue &OffImm) {
		if (N.getOpcode() == ISD::SUB \|\| CurDAG->isBaseWithConstantOffset(N)) {
		int RHSC;
		if (isScaledConstantInRange(N.getOperand(1), 1 << Shift, -255, 256, RHSC)) {
		efriedmaUnsubmitted Not Done Reply Inline Actions Are you sure this range computation is right? The range should be multiples of 4 from -1020 to 1020. efriedma: Are you sure this range computation is right? The range should be multiples of 4 from -1020 to…
		dmgreenUnsubmitted Not Done Reply Inline Actions It may be worth adding tests for cases that are close to or just over the boundary. dmgreen: It may be worth adding tests for cases that are close to or just over the boundary.
		Base = N.getOperand(0);
		if (Base.getOpcode() == ISD::FrameIndex) {
		int FI = cast<FrameIndexSDNode>(Base)->getIndex();
		Base = CurDAG->getTargetFrameIndex(
		FI, TLI->getPointerTy(CurDAG->getDataLayout()));
		}

		if (N.getOpcode() == ISD::SUB)
		RHSC = -RHSC;
		OffImm =
		CurDAG->getTargetConstant(RHSC * (1 << Shift), SDLoc(N), MVT::i32);
		return true;
		}
		}

		// Base only.
		Base = N;
		OffImm = CurDAG->getTargetConstant(0, SDLoc(N), MVT::i32);
		return true;
		}

bool ARMDAGToDAGISel::SelectT2AddrModeImm8(SDValue N,		bool ARMDAGToDAGISel::SelectT2AddrModeImm8(SDValue N,
SDValue &Base, SDValue &OffImm) {		SDValue &Base, SDValue &OffImm) {
// Match simple R - imm8 operands.		// Match simple R - imm8 operands.
if (N.getOpcode() != ISD::ADD && N.getOpcode() != ISD::SUB &&		if (N.getOpcode() != ISD::ADD && N.getOpcode() != ISD::SUB &&
!CurDAG->isBaseWithConstantOffset(N))		!CurDAG->isBaseWithConstantOffset(N))
return false;		return false;

if (ConstantSDNode *RHS = dyn_cast<ConstantSDNode>(N.getOperand(1))) {		if (ConstantSDNode *RHS = dyn_cast<ConstantSDNode>(N.getOperand(1))) {
▲ Show 20 Lines • Show All 2,327 Lines • ▼ Show 20 Lines	SDValue Ops[] = { N->getOperand(1),
N->getOperand(0) };		N->getOperand(0) };
unsigned Opc = N->getOpcode() == ARMISD::WLS ?		unsigned Opc = N->getOpcode() == ARMISD::WLS ?
ARM::t2WhileLoopStart : ARM::t2LoopEnd;		ARM::t2WhileLoopStart : ARM::t2LoopEnd;
SDNode *New = CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops);		SDNode *New = CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops);
ReplaceUses(N, New);		ReplaceUses(N, New);
CurDAG->RemoveDeadNode(N);		CurDAG->RemoveDeadNode(N);
return;		return;
}		}
		case ARMISD::LDRD: {
		if (Subtarget->isThumb2())
		break; // TableGen handles isel in this case.
		SDValue Base, RegOffset, ImmOffset;
		const SDValue &Chain = N->getOperand(0);
		const SDValue &Addr = N->getOperand(1);
		SelectAddrMode3(Addr, Base, RegOffset, ImmOffset);
		if (RegOffset != CurDAG->getRegister(0, MVT::i32)) {
		efriedmaUnsubmitted Not Done Reply Inline Actions The RegOffset check could use a comment explaining what it's doing, here and for STRD. efriedma: The RegOffset check could use a comment explaining what it's doing, here and for STRD.
		// The register-offset variant of LDRD mandates that the register
		// allocated to RegOffset is not reused in any of the remaining operands.
		// This restriction is currently not enforced. Therefore emitting this
		// variant is explicitly avoided.
		Base = Addr;
		RegOffset = CurDAG->getRegister(0, MVT::i32);
		}
		SDValue Ops[] = {Base, RegOffset, ImmOffset, Chain};
		SDNode *New = CurDAG->getMachineNode(ARM::LOADDUAL, dl,
		{MVT::Untyped, MVT::Other}, Ops);
		SDValue Lo = CurDAG->getTargetExtractSubreg(ARM::gsub_0, dl, MVT::i32,
		SDValue(New, 0));
		SDValue Hi = CurDAG->getTargetExtractSubreg(ARM::gsub_1, dl, MVT::i32,
		SDValue(New, 0));
		transferMemOperands(N, New);
		ReplaceUses(SDValue(N, 0), Lo);
		ReplaceUses(SDValue(N, 1), Hi);
		ReplaceUses(SDValue(N, 2), SDValue(New, 1));
		CurDAG->RemoveDeadNode(N);
		return;
		}
		case ARMISD::STRD: {
		if (Subtarget->isThumb2())
		break; // TableGen handles isel in this case.
		efriedmaUnsubmitted Not Done Reply Inline Actions I think you could implement this pattern in TableGen (grep for REG_SEQUENCE in the ARM .td files). But probably not the corresponding load pattern, since the inverse opcode doesn't exist, so not a big deal either way. efriedma: I think you could implement this pattern in TableGen (grep for REG_SEQUENCE in the ARM .td…
		SDValue Base, RegOffset, ImmOffset;
		const SDValue &Chain = N->getOperand(0);
		const SDValue &Addr = N->getOperand(3);
		SelectAddrMode3(Addr, Base, RegOffset, ImmOffset);
		if (RegOffset != CurDAG->getRegister(0, MVT::i32)) {
		// The register-offset variant of STRD mandates that the register
		// allocated to RegOffset is not reused in any of the remaining operands.
		// This restriction is currently not enforced. Therefore emitting this
		// variant is explicitly avoided.
		Base = Addr;
		RegOffset = CurDAG->getRegister(0, MVT::i32);
		}
		SDNode *RegPair =
		createGPRPairNode(MVT::Untyped, N->getOperand(1), N->getOperand(2));
		SDValue Ops[] = {SDValue(RegPair, 0), Base, RegOffset, ImmOffset, Chain};
		SDNode *New = CurDAG->getMachineNode(ARM::STOREDUAL, dl, MVT::Other, Ops);
		transferMemOperands(N, New);
		ReplaceUses(SDValue(N, 0), SDValue(New, 0));
		CurDAG->RemoveDeadNode(N);
		return;
		}
case ARMISD::LOOP_DEC: {		case ARMISD::LOOP_DEC: {
SDValue Ops[] = { N->getOperand(1),		SDValue Ops[] = { N->getOperand(1),
N->getOperand(2),		N->getOperand(2),
N->getOperand(0) };		N->getOperand(0) };
SDNode *Dec =		SDNode *Dec =
CurDAG->getMachineNode(ARM::t2LoopDec, dl,		CurDAG->getMachineNode(ARM::t2LoopDec, dl,
CurDAG->getVTList(MVT::i32, MVT::Other), Ops);		CurDAG->getVTList(MVT::i32, MVT::Other), Ops);
ReplaceUses(N, Dec);		ReplaceUses(N, Dec);
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	case ARMISD::VZIP: {
case MVT::v16i8: Opc = ARM::VZIPq8; break;		case MVT::v16i8: Opc = ARM::VZIPq8; break;
case MVT::v8f16:		case MVT::v8f16:
case MVT::v8i16: Opc = ARM::VZIPq16; break;		case MVT::v8i16: Opc = ARM::VZIPq16; break;
case MVT::v4f32:		case MVT::v4f32:
case MVT::v4i32: Opc = ARM::VZIPq32; break;		case MVT::v4i32: Opc = ARM::VZIPq32; break;
}		}
SDValue Pred = getAL(CurDAG, dl);		SDValue Pred = getAL(CurDAG, dl);
SDValue PredReg = CurDAG->getRegister(0, MVT::i32);		SDValue PredReg = CurDAG->getRegister(0, MVT::i32);
SDValue Ops[] = { N->getOperand(0), N->getOperand(1), Pred, PredReg };		SDValue Ops[] = { N->getOperand(0), N->getOperand(1), Pred, PredReg };
		efriedmaUnsubmitted Not Done Reply Inline Actions You should be able to use TableGen patterns for these, I think. AArch64Prefetch is an example of something similar to what you want. efriedma: You should be able to use TableGen patterns for these, I think. AArch64Prefetch is an example…
ReplaceNode(N, CurDAG->getMachineNode(Opc, dl, VT, VT, Ops));		ReplaceNode(N, CurDAG->getMachineNode(Opc, dl, VT, VT, Ops));
return;		return;
}		}
case ARMISD::VUZP: {		case ARMISD::VUZP: {
unsigned Opc = 0;		unsigned Opc = 0;
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
switch (VT.getSimpleVT().SimpleTy) {		switch (VT.getSimpleVT().SimpleTy) {
default: return;		default: return;
▲ Show 20 Lines • Show All 1,493 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

// NEON stores with post-increment base updates:		// NEON stores with post-increment base updates:
VST1_UPD,		VST1_UPD,
VST2_UPD,		VST2_UPD,
VST3_UPD,		VST3_UPD,
VST4_UPD,		VST4_UPD,
VST2LN_UPD,		VST2LN_UPD,
VST3LN_UPD,		VST3LN_UPD,
VST4LN_UPD		VST4LN_UPD,

		// Load/Store of dual registers
		LDRD,
		STRD
};		};

} // end namespace ARMISD		} // end namespace ARMISD

/// Define some predicates that are used for node matching.		/// Define some predicates that are used for node matching.
namespace ARM {		namespace ARM {

bool isBitFieldInvertedMask(unsigned v);		bool isBitFieldInvertedMask(unsigned v);
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	private:
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFSETCC(SDValue Op, SelectionDAG &DAG) const;
void lowerABS(SDNode *N, SmallVectorImpl<SDValue> &Results,		void lowerABS(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		void LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) const;

Register getRegisterByName(const char* RegName, LLT VT,		Register getRegisterByName(const char* RegName, LLT VT,
const MachineFunction &MF) const override;		const MachineFunction &MF) const override;

SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
SmallVectorImpl<SDNode *> &Created) const override;		SmallVectorImpl<SDNode *> &Created) const override;

bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,		bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,076 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,

setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL, MVT::i64, Custom);		setOperationAction(ISD::SRL, MVT::i64, Custom);
setOperationAction(ISD::SRA, MVT::i64, Custom);		setOperationAction(ISD::SRA, MVT::i64, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i64, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i64, Custom);
		setOperationAction(ISD::LOAD, MVT::i64, Custom);
		setOperationAction(ISD::STORE, MVT::i64, Custom);

// MVE lowers 64 bit shifts to lsll and lsrl		// MVE lowers 64 bit shifts to lsll and lsrl
// assuming that ISD::SRL and SRA of i64 are already marked custom		// assuming that ISD::SRL and SRA of i64 are already marked custom
if (Subtarget->hasMVEIntegerOps())		if (Subtarget->hasMVEIntegerOps())
setOperationAction(ISD::SHL, MVT::i64, Custom);		setOperationAction(ISD::SHL, MVT::i64, Custom);

// Expand to __aeabi_l{lsl,lsr,asr} calls for Thumb1.		// Expand to __aeabi_l{lsl,lsr,asr} calls for Thumb1.
if (Subtarget->isThumb1Only()) {		if (Subtarget->isThumb1Only()) {
▲ Show 20 Lines • Show All 526 Lines • ▼ Show 20 Lines	const char *ARMTargetLowering::getTargetNodeName(unsigned Opcode) const {
case ARMISD::THREAD_POINTER:return "ARMISD::THREAD_POINTER";		case ARMISD::THREAD_POINTER:return "ARMISD::THREAD_POINTER";

case ARMISD::DYN_ALLOC: return "ARMISD::DYN_ALLOC";		case ARMISD::DYN_ALLOC: return "ARMISD::DYN_ALLOC";

case ARMISD::MEMBARRIER_MCR: return "ARMISD::MEMBARRIER_MCR";		case ARMISD::MEMBARRIER_MCR: return "ARMISD::MEMBARRIER_MCR";

case ARMISD::PRELOAD: return "ARMISD::PRELOAD";		case ARMISD::PRELOAD: return "ARMISD::PRELOAD";

		case ARMISD::LDRD: return "ARMISD::LDRD";
		case ARMISD::STRD: return "ARMISD::STRD";

case ARMISD::WIN__CHKSTK: return "ARMISD::WIN__CHKSTK";		case ARMISD::WIN__CHKSTK: return "ARMISD::WIN__CHKSTK";
case ARMISD::WIN__DBZCHK: return "ARMISD::WIN__DBZCHK";		case ARMISD::WIN__DBZCHK: return "ARMISD::WIN__DBZCHK";

case ARMISD::PREDICATE_CAST: return "ARMISD::PREDICATE_CAST";		case ARMISD::PREDICATE_CAST: return "ARMISD::PREDICATE_CAST";
case ARMISD::VECTOR_REG_CAST: return "ARMISD::VECTOR_REG_CAST";		case ARMISD::VECTOR_REG_CAST: return "ARMISD::VECTOR_REG_CAST";
case ARMISD::VCMP: return "ARMISD::VCMP";		case ARMISD::VCMP: return "ARMISD::VCMP";
case ARMISD::VCMPZ: return "ARMISD::VCMPZ";		case ARMISD::VCMPZ: return "ARMISD::VCMPZ";
case ARMISD::VTST: return "ARMISD::VTST";		case ARMISD::VTST: return "ARMISD::VTST";
▲ Show 20 Lines • Show All 7,511 Lines • ▼ Show 20 Lines	SDValue Load = DAG.getExtLoad(
LD->getMemOperand());		LD->getMemOperand());
SDValue Pred = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::v16i1, Load);		SDValue Pred = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::v16i1, Load);
if (MemVT != MVT::v16i1)		if (MemVT != MVT::v16i1)
Pred = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MemVT, Pred,		Pred = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MemVT, Pred,
DAG.getConstant(0, dl, MVT::i32));		DAG.getConstant(0, dl, MVT::i32));
return DAG.getMergeValues({Pred, Load.getValue(1)}, dl);		return DAG.getMergeValues({Pred, Load.getValue(1)}, dl);
}		}

		void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) const {
		LoadSDNode *LD = cast<LoadSDNode>(N);
		EVT MemVT = LD->getMemoryVT();
		assert(LD->isUnindexed() && "Loads should be unindexed at this point.");

		if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
		!Subtarget->isThumb1Only() && LD->isVolatile()) {
		efriedmaUnsubmitted Not Done Reply Inline Actions I'd prefer not to exclude extending loads here. Could lead to weird cases where we miss the transform. efriedma: I'd prefer not to exclude extending loads here. Could lead to weird cases where we miss the…
		efriedmaUnsubmitted Not Done Reply Inline Actions You still need to handle extending loads somehow... Actually, maybe we don't mess with volatile loads in DAGCombine, and you don't need to implement it. In that case, it would still be nice to have an assertion, in case someone changes it at some point. efriedma: You still need to handle extending loads somehow... Actually, maybe we don't mess with…
		vhscamposAuthorUnsubmitted Done Reply Inline Actions Your last comment on this wasn't clear to me. If after the latest change you still want me to add anything, please let me know. vhscampos: Your last comment on this wasn't clear to me. If after the latest change you still want me to…
		SDLoc dl(N);
		SDValue Result = DAG.getMemIntrinsicNode(
		ARMISD::LDRD, dl, DAG.getVTList({MVT::i32, MVT::i32, MVT::Other}),
		{LD->getChain(), LD->getBasePtr()}, MemVT, LD->getMemOperand());
		SDValue Lo = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 0 : 1);
		SDValue Hi = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 1 : 0);
		SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lo, Hi);
		Results.append({Pair, Result.getValue(2)});
		}
		}

static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {
StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());		StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
EVT MemVT = ST->getMemoryVT();		EVT MemVT = ST->getMemoryVT();
		efriedmaUnsubmitted Not Done Reply Inline Actions Please don't make MachineSDNodes this early; it might appear to mostly work, but other code is not expecting MachineSDNodes at this point. See ARMISelLowering.h for how to introduce a target-specific SDNode. efriedma: Please don't make MachineSDNodes this early; it might appear to mostly work, but other code is…
assert((MemVT == MVT::v4i1 \|\| MemVT == MVT::v8i1 \|\| MemVT == MVT::v16i1) &&		assert((MemVT == MVT::v4i1 \|\| MemVT == MVT::v8i1 \|\| MemVT == MVT::v16i1) &&
"Expected a predicate type!");		"Expected a predicate type!");
assert(MemVT == ST->getValue().getValueType());		assert(MemVT == ST->getValue().getValueType());
assert(!ST->isTruncatingStore() && "Expected a non-extending store");		assert(!ST->isTruncatingStore() && "Expected a non-extending store");
assert(ST->isUnindexed() && "Expected a unindexed store");		assert(ST->isUnindexed() && "Expected a unindexed store");

// Only store the v4i1 or v8i1 worth of bits, via a buildvector with top bits		// Only store the v4i1 or v8i1 worth of bits, via a buildvector with top bits
// unset and a scalar store.		// unset and a scalar store.
Show All 10 Lines	static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {
}		}
SDValue GRP = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::i32, Build);		SDValue GRP = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::i32, Build);
return DAG.getTruncStore(		return DAG.getTruncStore(
ST->getChain(), dl, GRP, ST->getBasePtr(),		ST->getChain(), dl, GRP, ST->getBasePtr(),
EVT::getIntegerVT(*DAG.getContext(), MemVT.getSizeInBits()),		EVT::getIntegerVT(*DAG.getContext(), MemVT.getSizeInBits()),
ST->getMemOperand());		ST->getMemOperand());
}		}

		static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
		const ARMSubtarget *Subtarget) {
		StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
		EVT MemVT = ST->getMemoryVT();
		assert(ST->isUnindexed() && "Stores should be unindexed at this point.");

		if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
		efriedmaUnsubmitted Not Done Reply Inline Actions We also don't want to restrict truncating stores. efriedma: We also don't want to restrict truncating stores.
		!Subtarget->isThumb1Only() && ST->isVolatile()) {
		SDNode *N = Op.getNode();
		SDLoc dl(N);

		SDValue Lo = DAG.getNode(
		ISD::EXTRACT_ELEMENT, dl, MVT::i32, ST->getValue(),
		DAG.getTargetConstant(DAG.getDataLayout().isLittleEndian() ? 0 : 1, dl,
		MVT::i32));
		SDValue Hi = DAG.getNode(
		ISD::EXTRACT_ELEMENT, dl, MVT::i32, ST->getValue(),
		DAG.getTargetConstant(DAG.getDataLayout().isLittleEndian() ? 1 : 0, dl,
		MVT::i32));
		efriedmaUnsubmitted Not Done Reply Inline Actions Loads and stores should not have an "Offset" at this point; we don't form pre/post-indexed operations until after legalization. Maybe worth asserting "isUnindexed()". (Same for loads.) efriedma: Loads and stores should not have an "Offset" at this point; we don't form pre/post-indexed…

		return DAG.getMemIntrinsicNode(ARMISD::STRD, dl, DAG.getVTList(MVT::Other),
		dmgreenUnsubmitted Not Done Reply Inline Actions Likewise, this can just say if (Subtarget->hasMVEIntegerOps() && (VT == MVT::v4i1 \|\| VT == MVT::v8i1 \|\| VT == MVT::v16i1)) The other cases (isTruncatingStore and isUnindexed) should not come up. If they do we should noisily fail with an assert. dmgreen: Likewise, this can just say if (Subtarget->hasMVEIntegerOps() && (VT == MVT::v4i1 \|\| VT == MVT…
		{ST->getChain(), Lo, Hi, ST->getBasePtr()},
		MemVT, ST->getMemOperand());
		} else if (Subtarget->hasMVEIntegerOps() &&
		((MemVT == MVT::v4i1 \|\| MemVT == MVT::v8i1 \|\|
		MemVT == MVT::v16i1))) {
		return LowerPredicateStore(Op, DAG);
		}

		return SDValue();
		}

static bool isZeroVector(SDValue N) {		static bool isZeroVector(SDValue N) {
return (ISD::isBuildVectorAllZeros(N.getNode()) \|\|		return (ISD::isBuildVectorAllZeros(N.getNode()) \|\|
(N->getOpcode() == ARMISD::VMOVIMM &&		(N->getOpcode() == ARMISD::VMOVIMM &&
isNullConstant(N->getOperand(0))));		isNullConstant(N->getOperand(0))));
}		}

static SDValue LowerMLOAD(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerMLOAD(SDValue Op, SelectionDAG &DAG) {
MaskedLoadSDNode *N = cast<MaskedLoadSDNode>(Op.getNode());		MaskedLoadSDNode *N = cast<MaskedLoadSDNode>(Op.getNode());
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::USUBO:		case ISD::USUBO:
return LowerUnsignedALUO(Op, DAG);		return LowerUnsignedALUO(Op, DAG);
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
return LowerSADDSUBSAT(Op, DAG, Subtarget);		return LowerSADDSUBSAT(Op, DAG, Subtarget);
case ISD::LOAD:		case ISD::LOAD:
return LowerPredicateLoad(Op, DAG);		return LowerPredicateLoad(Op, DAG);
case ISD::STORE:		case ISD::STORE:
return LowerPredicateStore(Op, DAG);		return LowerSTORE(Op, DAG, Subtarget);
		dmgreenUnsubmitted Not Done Reply Inline Actions How come this is altered, but not the LowerPredicateLoad? dmgreen: How come this is altered, but not the LowerPredicateLoad?
		vhscamposAuthorUnsubmitted Done Reply Inline Actions The custom lowering of loads and stores here is triggered by the DAG Type Legalizer, since i64 is not supported. In DAGTypeLegalizer::CustomLowerNode(), custom lowering of loads is directed to ARMTargetLowering::ReplaceNodeResults(), which then calls LowerLOAD(), created in the present patch. The lowering of stores is the one that is directed to ARMTargetLowering::LowerOperation(). In summary, the custom lowering of loads because of illegal result types does not go through here, so I believe there's no need to have it changed in this point. vhscampos: The custom lowering of loads and stores here is triggered by the DAG Type Legalizer, since i64…
case ISD::MLOAD:		case ISD::MLOAD:
return LowerMLOAD(Op, DAG);		return LowerMLOAD(Op, DAG);
case ISD::ATOMIC_LOAD:		case ISD::ATOMIC_LOAD:
case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG);		case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, DAG);
case ISD::SDIVREM:		case ISD::SDIVREM:
case ISD::UDIVREM: return LowerDivRem(Op, DAG);		case ISD::UDIVREM: return LowerDivRem(Op, DAG);
case ISD::DYNAMIC_STACKALLOC:		case ISD::DYNAMIC_STACKALLOC:
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
ReplaceCMP_SWAP_64Results(N, Results, DAG);		ReplaceCMP_SWAP_64Results(N, Results, DAG);
return;		return;
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return ReplaceLongIntrinsic(N, Results, DAG);		return ReplaceLongIntrinsic(N, Results, DAG);
case ISD::ABS:		case ISD::ABS:
lowerABS(N, Results, DAG);		lowerABS(N, Results, DAG);
return ;		return ;
		case ISD::LOAD:
		LowerLOAD(N, Results, DAG);
		break;
}		}
if (Res.getNode())		if (Res.getNode())
Results.push_back(Res);		Results.push_back(Res);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM Scheduler Hooks		// ARM Scheduler Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 1,381 Lines • ▼ Show 20 Lines	if (MI.getOpcode() == ARM::MEMCPY) {
attachMEMCPYScratchRegs(Subtarget, MI, Node);		attachMEMCPYScratchRegs(Subtarget, MI, Node);
return;		return;
}		}

const MCInstrDesc *MCID = &MI.getDesc();		const MCInstrDesc *MCID = &MI.getDesc();
// Adjust potentially 's' setting instructions after isel, i.e. ADC, SBC, RSB,		// Adjust potentially 's' setting instructions after isel, i.e. ADC, SBC, RSB,
// RSC. Coming out of isel, they have an implicit CPSR def, but the optional		// RSC. Coming out of isel, they have an implicit CPSR def, but the optional
// operand is still set to noreg. If needed, set the optional operand's		// operand is still set to noreg. If needed, set the optional operand's
// register to CPSR, and remove the redundant implicit def.		// register to CPSR, and remove the redundant implicit def.
		efriedmaUnsubmitted Not Done Reply Inline Actions Oh, I didn't notice this before. There's a problem here: if we're trying to generate LDRD for the sake of the extra guarantees provided by the some targets, transforming that to LDM is wrong; it doesn't have the same guarantee, and therefore could cause unpredictable, subtle problems. Probably we want to allocate a GPRPair, instead of allocating two separate registers and trying to tie them together with a hint. Maybe requires defining a new pseudo-instruction that takes a GPRPair instead of two GPRs. Or I'd be okay with just restricting the optimization to Thumb2 for now, if you don't want to do the extra work right now. efriedma: Oh, I didn't notice this before. There's a problem here: if we're trying to generate LDRD for…
//		//
// e.g. ADCS (..., implicit-def CPSR) -> ADC (... opt:def CPSR).		// e.g. ADCS (..., implicit-def CPSR) -> ADC (... opt:def CPSR).

// Rename pseudo opcodes.		// Rename pseudo opcodes.
unsigned NewOpc = convertAddSubFlagsOpcode(MI.getOpcode());		unsigned NewOpc = convertAddSubFlagsOpcode(MI.getOpcode());
unsigned ccOutIdx;		unsigned ccOutIdx;
if (NewOpc) {		if (NewOpc) {
const ARMBaseInstrInfo *TII = Subtarget->getInstrInfo();		const ARMBaseInstrInfo *TII = Subtarget->getInstrInfo();
▲ Show 20 Lines • Show All 7,546 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrInfo.td

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	def ARMsmlaltb : SDNode<"ARMISD::SMLALTB", SDT_LongMac, []>;			def ARMsmlaltb : SDNode<"ARMISD::SMLALTB", SDT_LongMac, []>;
	def ARMsmlaltt : SDNode<"ARMISD::SMLALTT", SDT_LongMac, []>;			def ARMsmlaltt : SDNode<"ARMISD::SMLALTT", SDT_LongMac, []>;

	def ARMqadd8b : SDNode<"ARMISD::QADD8b", SDT_ARMAnd, []>;			def ARMqadd8b : SDNode<"ARMISD::QADD8b", SDT_ARMAnd, []>;
	def ARMqsub8b : SDNode<"ARMISD::QSUB8b", SDT_ARMAnd, []>;			def ARMqsub8b : SDNode<"ARMISD::QSUB8b", SDT_ARMAnd, []>;
	def ARMqadd16b : SDNode<"ARMISD::QADD16b", SDT_ARMAnd, []>;			def ARMqadd16b : SDNode<"ARMISD::QADD16b", SDT_ARMAnd, []>;
	def ARMqsub16b : SDNode<"ARMISD::QSUB16b", SDT_ARMAnd, []>;			def ARMqsub16b : SDNode<"ARMISD::QSUB16b", SDT_ARMAnd, []>;

				def SDT_ARMldrd : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
				def ARMldrd : SDNode<"ARMISD::LDRD", SDT_ARMldrd, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

				def SDT_ARMstrd : SDTypeProfile<0, 3, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>, SDTCisPtrTy<2>]>;
				def ARMstrd : SDNode<"ARMISD::STRD", SDT_ARMstrd, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

	// Vector operations shared between NEON and MVE			// Vector operations shared between NEON and MVE

	def ARMvdup : SDNode<"ARMISD::VDUP", SDTypeProfile<1, 1, [SDTCisVec<0>]>>;			def ARMvdup : SDNode<"ARMISD::VDUP", SDTypeProfile<1, 1, [SDTCisVec<0>]>>;

	// VDUPLANE can produce a quad-register result from a double-register source,			// VDUPLANE can produce a quad-register result from a double-register source,
	// so the result is not constrained to match the source.			// so the result is not constrained to match the source.
	def ARMvduplane : SDNode<"ARMISD::VDUPLANE",			def ARMvduplane : SDNode<"ARMISD::VDUPLANE",
	SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisVec<1>,			SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisVec<1>,
	▲ Show 20 Lines • Show All 2,475 Lines • ▼ Show 20 Lines

	let mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1 in {			let mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1 in {
	// Load doubleword			// Load doubleword
	def LDRD : AI3ld<0b1101, 0, (outs GPR:$Rt, GPR:$Rt2), (ins addrmode3:$addr),			def LDRD : AI3ld<0b1101, 0, (outs GPR:$Rt, GPR:$Rt2), (ins addrmode3:$addr),
	LdMiscFrm, IIC_iLoad_d_r, "ldrd", "\t$Rt, $Rt2, $addr", []>,			LdMiscFrm, IIC_iLoad_d_r, "ldrd", "\t$Rt, $Rt2, $addr", []>,
	Requires<[IsARM, HasV5TE]>;			Requires<[IsARM, HasV5TE]>;
	}			}

				let mayLoad = 1, hasSideEffects = 0, hasNoSchedulingInfo = 1 in {
				efriedmaUnsubmitted Not Done Reply Inline Actions hasExtraDefRegAllocReq shouldn't be necessary; the regular ldrd/strd are weird because the register allocation constraint isn't expressed correctly by the operand types, but you don't have that problem here. efriedma: hasExtraDefRegAllocReq shouldn't be necessary; the regular ldrd/strd are weird because the…
				def LOADDUAL : ARMPseudoInst<(outs GPRPairOp:$Rt), (ins addrmode3:$addr),
				64, IIC_iLoad_d_r, []>,
				Requires<[IsARM, HasV5TE]> {
				let AM = AddrMode3;
				}
				}

	def LDA : AIldracq<0b00, (outs GPR:$Rt), (ins addr_offset_none:$addr),			def LDA : AIldracq<0b00, (outs GPR:$Rt), (ins addr_offset_none:$addr),
	NoItinerary, "lda", "\t$Rt, $addr", []>;			NoItinerary, "lda", "\t$Rt, $addr", []>;
	def LDAB : AIldracq<0b10, (outs GPR:$Rt), (ins addr_offset_none:$addr),			def LDAB : AIldracq<0b10, (outs GPR:$Rt), (ins addr_offset_none:$addr),
	NoItinerary, "ldab", "\t$Rt, $addr", []>;			NoItinerary, "ldab", "\t$Rt, $addr", []>;
	def LDAH : AIldracq<0b11, (outs GPR:$Rt), (ins addr_offset_none:$addr),			def LDAH : AIldracq<0b11, (outs GPR:$Rt), (ins addr_offset_none:$addr),
	NoItinerary, "ldah", "\t$Rt, $addr", []>;			NoItinerary, "ldah", "\t$Rt, $addr", []>;

	// Indexed loads			// Indexed loads
	▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
	let mayStore = 1, hasSideEffects = 0, hasExtraSrcRegAllocReq = 1 in {			let mayStore = 1, hasSideEffects = 0, hasExtraSrcRegAllocReq = 1 in {
	def STRD : AI3str<0b1111, (outs), (ins GPR:$Rt, GPR:$Rt2, addrmode3:$addr),			def STRD : AI3str<0b1111, (outs), (ins GPR:$Rt, GPR:$Rt2, addrmode3:$addr),
	StMiscFrm, IIC_iStore_d_r, "strd", "\t$Rt, $Rt2, $addr", []>,			StMiscFrm, IIC_iStore_d_r, "strd", "\t$Rt, $Rt2, $addr", []>,
	Requires<[IsARM, HasV5TE]> {			Requires<[IsARM, HasV5TE]> {
	let Inst{21} = 0;			let Inst{21} = 0;
	}			}
	}			}

				let mayStore = 1, hasSideEffects = 0, hasNoSchedulingInfo = 1 in {
				def STOREDUAL : ARMPseudoInst<(outs), (ins GPRPairOp:$Rt, addrmode3:$addr),
				64, IIC_iStore_d_r, []>,
				Requires<[IsARM, HasV5TE]> {
				let AM = AddrMode3;
				}
				}

	// Indexed stores			// Indexed stores
	multiclass AI2_stridx<bit isByte, string opc,			multiclass AI2_stridx<bit isByte, string opc,
	InstrItinClass iii, InstrItinClass iir> {			InstrItinClass iii, InstrItinClass iir> {
	def _PRE_IMM : AI2ldstidx<0, isByte, 1, (outs GPR:$Rn_wb),			def _PRE_IMM : AI2ldstidx<0, isByte, 1, (outs GPR:$Rn_wb),
	(ins GPR:$Rt, addrmode_imm12_pre:$addr), IndexModePre,			(ins GPR:$Rt, addrmode_imm12_pre:$addr), IndexModePre,
	StFrm, iii,			StFrm, iii,
	opc, "\t$Rt, $addr!",			opc, "\t$Rt, $addr!",
	"$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []> {			"$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []> {
	▲ Show 20 Lines • Show All 3,275 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrThumb2.td

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	def t2am_imm8_offset : MemOperand,
[], [SDNPWantRoot]> {		[], [SDNPWantRoot]> {
let PrintMethod = "printT2AddrModeImm8OffsetOperand";		let PrintMethod = "printT2AddrModeImm8OffsetOperand";
let EncoderMethod = "getT2AddrModeImm8OffsetOpValue";		let EncoderMethod = "getT2AddrModeImm8OffsetOpValue";
let DecoderMethod = "DecodeT2Imm8";		let DecoderMethod = "DecodeT2Imm8";
}		}

// t2addrmode_imm8s4 := reg +/- (imm8 << 2)		// t2addrmode_imm8s4 := reg +/- (imm8 << 2)
def MemImm8s4OffsetAsmOperand : AsmOperandClass {let Name = "MemImm8s4Offset";}		def MemImm8s4OffsetAsmOperand : AsmOperandClass {let Name = "MemImm8s4Offset";}
class T2AddrMode_Imm8s4 : MemOperand {		class T2AddrMode_Imm8s4 : MemOperand,
		ComplexPattern<i32, 2, "SelectT2AddrModeImm8<2>", []> {
let EncoderMethod = "getT2AddrModeImm8s4OpValue";		let EncoderMethod = "getT2AddrModeImm8s4OpValue";
let DecoderMethod = "DecodeT2AddrModeImm8s4";		let DecoderMethod = "DecodeT2AddrModeImm8s4";
let ParserMatchClass = MemImm8s4OffsetAsmOperand;		let ParserMatchClass = MemImm8s4OffsetAsmOperand;
let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);		let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);
}		}

def t2addrmode_imm8s4 : T2AddrMode_Imm8s4 {		def t2addrmode_imm8s4 : T2AddrMode_Imm8s4 {
let PrintMethod = "printT2AddrModeImm8s4Operand<false>";		let PrintMethod = "printT2AddrModeImm8s4Operand<false>";
▲ Show 20 Lines • Show All 1,161 Lines • ▼ Show 20 Lines	defm t2LDRSH : T2I_ld<1, 0b01, "ldrsh", IIC_iLoad_bh_i, IIC_iLoad_bh_si,
GPRnopc, sextloadi16>;		GPRnopc, sextloadi16>;
defm t2LDRSB : T2I_ld<1, 0b00, "ldrsb", IIC_iLoad_bh_i, IIC_iLoad_bh_si,		defm t2LDRSB : T2I_ld<1, 0b00, "ldrsb", IIC_iLoad_bh_i, IIC_iLoad_bh_si,
GPRnopc, sextloadi8>;		GPRnopc, sextloadi8>;

let mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1 in {		let mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1 in {
// Load doubleword		// Load doubleword
def t2LDRDi8 : T2Ii8s4<1, 0, 1, (outs rGPR:$Rt, rGPR:$Rt2),		def t2LDRDi8 : T2Ii8s4<1, 0, 1, (outs rGPR:$Rt, rGPR:$Rt2),
(ins t2addrmode_imm8s4:$addr),		(ins t2addrmode_imm8s4:$addr),
IIC_iLoad_d_i, "ldrd", "\t$Rt, $Rt2, $addr", "", []>,		IIC_iLoad_d_i, "ldrd", "\t$Rt, $Rt2, $addr", "",
		[(set rGPR:$Rt, rGPR:$Rt2, (ARMldrd t2addrmode_imm8s4:$addr))]>,
Sched<[WriteLd]>;		Sched<[WriteLd]>;
} // mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1		} // mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1

// zextload i1 -> zextload i8		// zextload i1 -> zextload i8
def : T2Pat<(zextloadi1 t2addrmode_imm12:$addr),		def : T2Pat<(zextloadi1 t2addrmode_imm12:$addr),
(t2LDRBi12 t2addrmode_imm12:$addr)>;		(t2LDRBi12 t2addrmode_imm12:$addr)>;
def : T2Pat<(zextloadi1 t2addrmode_negimm8:$addr),		def : T2Pat<(zextloadi1 t2addrmode_negimm8:$addr),
(t2LDRBi8 t2addrmode_negimm8:$addr)>;		(t2LDRBi8 t2addrmode_negimm8:$addr)>;
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	defm t2STRB:T2I_st<0b00,"strb", IIC_iStore_bh_i, IIC_iStore_bh_si,
rGPR, truncstorei8>;		rGPR, truncstorei8>;
defm t2STRH:T2I_st<0b01,"strh", IIC_iStore_bh_i, IIC_iStore_bh_si,		defm t2STRH:T2I_st<0b01,"strh", IIC_iStore_bh_i, IIC_iStore_bh_si,
rGPR, truncstorei16>;		rGPR, truncstorei16>;

// Store doubleword		// Store doubleword
let mayStore = 1, hasSideEffects = 0, hasExtraSrcRegAllocReq = 1 in		let mayStore = 1, hasSideEffects = 0, hasExtraSrcRegAllocReq = 1 in
def t2STRDi8 : T2Ii8s4<1, 0, 0, (outs),		def t2STRDi8 : T2Ii8s4<1, 0, 0, (outs),
(ins rGPR:$Rt, rGPR:$Rt2, t2addrmode_imm8s4:$addr),		(ins rGPR:$Rt, rGPR:$Rt2, t2addrmode_imm8s4:$addr),
IIC_iStore_d_r, "strd", "\t$Rt, $Rt2, $addr", "", []>,		IIC_iStore_d_r, "strd", "\t$Rt, $Rt2, $addr", "",
		[(ARMstrd rGPR:$Rt, rGPR:$Rt2, t2addrmode_imm8s4:$addr)]>,
Sched<[WriteST]>;		Sched<[WriteST]>;

// Indexed stores		// Indexed stores

let mayStore = 1, hasSideEffects = 0 in {		let mayStore = 1, hasSideEffects = 0 in {
def t2STR_PRE : T2Ipreldst<0, 0b10, 0, 1, (outs GPRnopc:$Rn_wb),		def t2STR_PRE : T2Ipreldst<0, 0b10, 0, 1, (outs GPRnopc:$Rn_wb),
(ins GPRnopc:$Rt, t2addrmode_imm8_pre:$addr),		(ins GPRnopc:$Rt, t2addrmode_imm8_pre:$addr),
AddrModeT2_i8, IndexModePre, IIC_iStore_iu,		AddrModeT2_i8, IndexModePre, IIC_iStore_iu,
▲ Show 20 Lines • Show All 3,845 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

This file was added.

				; RUN: llc -mtriple=armv5e-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-ARMV5TE,CHECK
				; RUN: llc -mtriple=thumbv6t2-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-T2,CHECK
				; RUN: llc -mtriple=armv4t-arm-none-eabi %s -o - \| FileCheck %s --check-prefixes=CHECK-ARMV4T,CHECK

				@x = common dso_local global i64 0, align 8
				@y = common dso_local global i64 0, align 8

				define void @test() {
				entry:
				; CHECK-LABEL: test:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
				; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y
				; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x
				; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y
				; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-T2-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #4]
				; CHECK-ARMV4T-NEXT: str [[R0]], {{\[}}[[ADDR1]], #4]
				; CHECK-ARMV4T-NEXT: str [[R1]], {{\[}}[[ADDR1]]]
				%0 = load volatile i64, i64* @x, align 8
				store volatile i64 %0, i64* @y, align 8
				ret void
				}

				define void @test_offset() {
				entry:
				; CHECK-LABEL: test_offset:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #-4]
				; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]], #-4]
				; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
				; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y
				; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x
				; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y
				; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #-4]
				; CHECK-T2-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]], #-4]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #-4]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV4T-NEXT: str [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-ARMV4T-NEXT: str [[R0]], {{\[}}[[ADDR1]], #-4]
				%0 = load volatile i64, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @x to i8), i32 -4) to i64), align 8
				store volatile i64 %0, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @y to i8), i32 -4) to i64), align 8
				ret void
				}

				define void @test_offset_1() {
				; CHECK-LABEL: test_offset_1:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #255]
				; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]], #255]
				; CHECK-T2: adds [[ADDR0:r[0-9]+]], #255
				; CHECK-T2-NEXT: adds [[ADDR1:r[0-9]+]], #255
				; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-T2-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #255]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #259]
				; CHECK-ARMV4T-NEXT: str [[R1]], {{\[}}[[ADDR1]], #259]
				; CHECK-ARMV4T-NEXT: str [[R0]], {{\[}}[[ADDR1]], #255]
				entry:
				%0 = load volatile i64, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @x to i8), i32 255) to i64), align 8
				store volatile i64 %0, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @y to i8), i32 255) to i64), align 8
				ret void
				}

				define void @test_offset_2() {
				; CHECK-LABEL: test_offset_2:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: add [[ADDR0]], [[ADDR0]], #256
				; CHECK-ARMV5TE-NEXT: add [[ADDR1]], [[ADDR1]], #256
				; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
				; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y
				; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x
				; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y
				; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #256]
				; CHECK-T2-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]], #256]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #256]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #260]
				; CHECK-ARMV4T-NEXT: str [[R1]], {{\[}}[[ADDR1]], #260]
				; CHECK-ARMV4T-NEXT: str [[R0]], {{\[}}[[ADDR1]], #256]
				entry:
				%0 = load volatile i64, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @x to i8), i32 256) to i64), align 8
				store volatile i64 %0, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @y to i8), i32 256) to i64), align 8
				ret void
				}

				define void @test_offset_3() {
				; CHECK-LABEL: test_offset_3:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: add [[ADDR0]], [[ADDR0]], #1020
				; CHECK-ARMV5TE-NEXT: add [[ADDR1]], [[ADDR1]], #1020
				; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
				; CHECK-T2-NEXT: movw [[ADDR1:r[0-9]+]], :lower16:y
				; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x
				; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y
				; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #1020]
				; CHECK-T2-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]], #1020]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #1020]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #1024]
				; CHECK-ARMV4T-NEXT: str [[R1]], {{\[}}[[ADDR1]], #1024]
				; CHECK-ARMV4T-NEXT: str [[R0]], {{\[}}[[ADDR1]], #1020]
				entry:
				%0 = load volatile i64, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @x to i8), i32 1020) to i64), align 8
				store volatile i64 %0, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @y to i8), i32 1020) to i64), align 8
				ret void
				}

				define void @test_offset_4() {
				; CHECK-LABEL: test_offset_4:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE-NEXT: add [[ADDR0]], [[ADDR0]], #1024
				; CHECK-ARMV5TE-NEXT: add [[ADDR1]], [[ADDR1]], #1024
				; CHECK-ARMV5TE-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV5TE-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-T2: movw [[ADDR1:r[0-9]+]], :lower16:y
				; CHECK-T2-NEXT: movw [[ADDR0:r[0-9]+]], :lower16:x
				; CHECK-T2-NEXT: movt [[ADDR1]], :upper16:y
				; CHECK-T2-NEXT: movt [[ADDR0]], :upper16:x
				; CHECK-T2-NEXT: add.w [[ADDR0]], [[ADDR0]], #1024
				; CHECK-T2-NEXT: add.w [[ADDR1]], [[ADDR1]], #1024
				; CHECK-T2-NEXT: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-T2-NEXT: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T-NEXT: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #1024]
				; CHECK-ARMV4T-NEXT: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]], #1028]
				; CHECK-ARMV4T-NEXT: str [[R1]], {{\[}}[[ADDR1]], #1028]
				; CHECK-ARMV4T-NEXT: str [[R0]], {{\[}}[[ADDR1]], #1024]
				entry:
				%0 = load volatile i64, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @x to i8), i32 1024) to i64), align 8
				store volatile i64 %0, i64* bitcast (i8* getelementptr (i8, i8* bitcast (i64* @y to i8), i32 1024) to i64), align 8
				ret void
				}

				define i64 @test_stack() {
				; CHECK-LABEL: test_stack:
				; CHECK-ARMV5TE: sub sp, sp, #80
				; CHECK-ARMV5TE-NEXT: mov [[R0:r[0-9]+]], #0
				; CHECK-ARMV5TE-NEXT: mov [[R1:r[0-9]+]], #1
				; CHECK-ARMV5TE-NEXT: strd [[R1]], [[R0]], [sp, #8]
				; CHECK-ARMV5TE-NEXT: ldrd r0, r1, [sp, #8]
				; CHECK-ARMV5TE-NEXT: add sp, sp, #80
				; CHECK-ARMV5TE-NEXT: bx lr
				; CHECK-T2: sub sp, #80
				; CHECK-T2-NEXT: movs [[R0:r[0-9]+]], #0
				; CHECK-T2-NEXT: movs [[R1:r[0-9]+]], #1
				; CHECK-T2-NEXT: strd [[R1]], [[R0]], [sp, #8]
				; CHECK-T2-NEXT: ldrd r0, r1, [sp, #8]
				; CHECK-T2-NEXT: add sp, #80
				; CHECK-T2-NEXT: bx lr
				; CHECK-ARMV4T: sub sp, sp, #80
				; CHECK-ARMV4T-NEXT: mov [[R0:r[0-9]+]], #0
				; CHECK-ARMV4T-NEXT: str [[R0]], [sp, #12]
				; CHECK-ARMV4T-NEXT: mov [[R1:r[0-9]+]], #1
				; CHECK-ARMV4T-NEXT: str [[R1]], [sp, #8]
				; CHECK-ARMV4T-NEXT: ldr r0, [sp, #8]
				; CHECK-ARMV4T-NEXT: ldr r1, [sp, #12]
				; CHECK-ARMV4T-NEXT: add sp, sp, #80
				; CHECK-ARMV4T-NEXT: bx lr
				entry:
				%a = alloca [10 x i64], align 8
				%arrayidx = getelementptr inbounds [10 x i64], [10 x i64]* %a, i32 0, i32 1
				store volatile i64 1, i64* %arrayidx, align 8
				%arrayidx1 = getelementptr inbounds [10 x i64], [10 x i64]* %a, i32 0, i32 1
				%0 = load volatile i64, i64* %arrayidx1, align 8
				ret i64 %0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve codegen of volatile load/store of i64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 266796

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/lib/Target/ARM/ARMInstrInfo.td

llvm/lib/Target/ARM/ARMInstrThumb2.td

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

[ARM] Improve codegen of volatile load/store of i64
ClosedPublic