This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.h
2/10
ARMISelLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
i64_volatile_load_store.ll

Differential D70072

[ARM] Improve codegen of volatile load/store of i64
ClosedPublic

Authored by vhscampos on Nov 11 2019, 3:53 AM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
john.brawn
nickdesaulniers

Commits

rGc010d4d19550: [ARM] Improve codegen of volatile load/store of i64
rG8a1255322318: [ARM] Improve codegen of volatile load/store of i64
rG60e0120c913d: [ARM] Improve codegen of volatile load/store of i64
rGbbcf1c3496ce: [ARM] Improve codegen of volatile load/store of i64

Summary

Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40736
Build 40867: arc lint + arc unit

Event Timeline

vhscampos created this revision.Nov 11 2019, 3:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2019, 3:53 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B40736: Diff 228663.Nov 11 2019, 3:58 AM

vhscampos added a reviewer: dmgreen.Nov 11 2019, 4:03 AM

vhscampos added reviewers: efriedma, john.brawn.Nov 11 2019, 5:13 AM

The architecture specification provides limited guarantees here, but I guess this doesn't do any harm.

llvm/lib/Target/ARM/ARMISelLowering.cpp
8969	I'd prefer not to exclude extending loads here. Could lead to weird cases where we miss the transform.
8983	Please don't make MachineSDNodes this early; it might appear to mostly work, but other code is not expecting MachineSDNodes at this point. See ARMISelLowering.h for how to introduce a target-specific SDNode.

Not exclude extloads anymore.
Create new ARMISD nodes specific to load/store of dual registers.
Custom lower i64 volatile loads/stores to these new ARMISD nodes.

Harbormaster completed remote builds in B40875: Diff 229043.Nov 13 2019, 2:34 AM

efriedma added inline comments.Nov 18 2019, 5:52 PM

llvm/lib/Target/ARM/ARMISelLowering.cpp
9039	Loads and stores should not have an "Offset" at this point; we don't form pre/post-indexed operations until after legalization. Maybe worth asserting "isUnindexed()". (Same for loads.)

efriedma added inline comments.Nov 18 2019, 5:52 PM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
3487 ↗	(On Diff #229043)	You should be able to use TableGen patterns for these, I think. AArch64Prefetch is an example of something similar to what you want.
llvm/lib/Target/ARM/ARMISelLowering.cpp
8969	You still need to handle extending loads somehow... Actually, maybe we don't mess with volatile loads in DAGCombine, and you don't need to implement it. In that case, it would still be nice to have an assertion, in case someone changes it at some point.
9027	We also don't want to restrict truncating stores.

dmgreen added inline comments.Nov 19 2019, 4:16 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
9041	Likewise, this can just say if (Subtarget->hasMVEIntegerOps() && (VT == MVT::v4i1 \|\| VT == MVT::v8i1 \|\| VT == MVT::v16i1)) The other cases (isTruncatingStore and isUnindexed) should not come up. If they do we should noisily fail with an assert.
9245	How come this is altered, but not the LowerPredicateLoad?

vhscampos marked an inline comment as done.Nov 19 2019, 5:16 AM

vhscampos added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
9245	The custom lowering of loads and stores here is triggered by the DAG Type Legalizer, since i64 is not supported. In DAGTypeLegalizer::CustomLowerNode(), custom lowering of loads is directed to ARMTargetLowering::ReplaceNodeResults(), which then calls LowerLOAD(), created in the present patch. The lowering of stores is the one that is directed to ARMTargetLowering::LowerOperation(). In summary, the custom lowering of loads because of illegal result types does not go through here, so I believe there's no need to have it changed in this point.

Move the custom SD nodes to TableGen.
Truncating stores not restricted anymore.
Remove isUnindexed() calls since they should return true always at this point.
Extend test to also check loads/stores that have an offset.

vhscampos marked an inline comment as done.Nov 20 2019, 8:49 AM

vhscampos added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
8969	Your last comment on this wasn't clear to me. If after the latest change you still want me to add anything, please let me know.

Harbormaster completed remote builds in B41248: Diff 230276.Nov 20 2019, 8:53 AM

efriedma added inline comments.Nov 21 2019, 4:34 PM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
1279 ↗	(On Diff #230276)	Are you sure this range computation is right? The range should be multiples of 4 from -1020 to 1020.

Fixed the immediate range check to cover -1020 to 1020.

Harbormaster completed remote builds in B41363: Diff 230622.Nov 22 2019, 3:22 AM

dmgreen added inline comments.Nov 27 2019, 7:59 AM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
1279 ↗	(On Diff #230276)	It may be worth adding tests for cases that are close to or just over the boundary.

Update summary to have a better explanation of this patch.
Add a post-ISel hook to add register allocation hints to LDRD/STRD operands.
In the AArch32 case, move ISel from TableGen back to the C++ side. This is needed because we must have a custom lowering whenever LDRD/STRD selection would normally yield a register offset. The ARM Load/Store Optimizer is not able to handle LDRD/STRD's register offsets in the cases where LDRD/STRD must be reverted to LDM/STM. As such, the C++ instruction selection opts for not generating instructions with a register offset.
Improve test by testing several immediate boundary cases.

vhscampos edited the summary of this revision. (Show Details)Dec 11 2019, 5:48 AM

Harbormaster completed remote builds in B42303: Diff 233337.Dec 11 2019, 5:50 AM

efriedma added inline comments.Dec 12 2019, 10:28 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
10749	Oh, I didn't notice this before. There's a problem here: if we're trying to generate LDRD for the sake of the extra guarantees provided by the some targets, transforming that to LDM is wrong; it doesn't have the same guarantee, and therefore could cause unpredictable, subtle problems. Probably we want to allocate a GPRPair, instead of allocating two separate registers and trying to tie them together with a hint. Maybe requires defining a new pseudo-instruction that takes a GPRPair instead of two GPRs. Or I'd be okay with just restricting the optimization to Thumb2 for now, if you don't want to do the extra work right now.

vhscampos edited the summary of this revision. (Show Details)Dec 16 2019, 6:29 AM

Create ARM PseudoInsts that take a register pair operand. This way we can enforce the allocation requirement.

Harbormaster completed remote builds in B42547: Diff 234045.Dec 16 2019, 6:33 AM

efriedma added inline comments.Dec 17 2019, 4:33 PM

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
3549 ↗	(On Diff #234045)	I think you could implement this pattern in TableGen (grep for REG_SEQUENCE in the ARM .td files). But probably not the corresponding load pattern, since the inverse opcode doesn't exist, so not a big deal either way.
llvm/lib/Target/ARM/ARMInstrInfo.td
2704 ↗	(On Diff #234045)	hasExtraDefRegAllocReq shouldn't be necessary; the regular ldrd/strd are weird because the register allocation constraint isn't expressed correctly by the operand types, but you don't have that problem here.

Changed STOREDUAL instruction selection to use TableGen patterns.
Removed one constraint from LOADDUAL and STOREDUAL.

Harbormaster completed remote builds in B42718: Diff 234493.Dec 18 2019, 2:49 AM

LGTM

This revision is now accepted and ready to land.Dec 18 2019, 2:26 PM

Closed by commit rGbbcf1c3496ce: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyDec 19 2019, 3:24 AM

This revision was automatically updated to reflect the committed changes.

Hi Victor,

this commit breaks "LLVM::2010-05-03-OriginDIE.ll" test on "llvm-clang-win-x-armv7l" builder.
Here is the failed build http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/1851

Would you take a look?

Thanks,
Vlad.

I created a new patch fixing the issue: https://reviews.llvm.org/D71749

Hi Victor, we're observing a crash in Clang when compiling the Linux kernel bisected to this commit. Can you please revert?
See the trace in https://ci.linaro.org/job/tcwg_kernel-bisect-llvm-master-arm-next-allmodconfig/62/artifact/artifacts/build-bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce/console.log grep for stack dump.

I have reverted this commit.

Also, this revision was reopened so that I can append its fix here for reviewing.

This revision is now accepted and ready to land.Dec 20 2019, 10:13 AM

This is the same patch as before but with fixes for the tests that regressed (as reported in the comments here).

The fix is to specify the address mode of the two pseudo instructions introduced.

Harbormaster completed remote builds in B43154: Diff 235844.Jan 2 2020, 3:30 AM

Thanks @vhscampos , I tested the new patch and could no longer reproduce the observed crashes. I'm not sure if I'm looking at the interdiff correctly, but consider adding additional test cases that properly describe and cover the breakage we observed.

Added one test to cover the issue related to loads/stores to a stack frame.

Harbormaster completed remote builds in B43418: Diff 236557.Jan 7 2020, 5:14 AM

Closed by commit rG60e0120c913d: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyJan 7 2020, 5:23 AM

This revision was automatically updated to reflect the committed changes.

@vhscampos sorry, we're getting new/different warnings now seemingly with this patch: https://github.com/ClangBuiltLinux/linux/issues/838

Warning: index register overlaps transfer register

nathanchance added a subscriber: nathanchance.Jan 16 2020, 11:29 AM

Apparently the ARM-mode LDRD is a bit more strange than I realized. From the ARM manual: if t2 == 15 || m == 15 || m == t || m == t2 then UNPREDICTABLE;. I guess we're managed to avoid running into this in the past by never generating the register form of ldrd.

It should be possible to express this constraint to the register allocator using @earlyclobber. (@earlyclobber is actually a little more conservative than we need, strictly speaking, but the difference probably doesn't matter too much.)

@nickdesaulniers @efriedma @nathanchance Apologies for missing the latest comments! Since this seems to be a blocker, I'd suggest that this change gets reverted until I am able to have a closer look at the issue of register overlapping. What do you think?

Please revert; I am happy to test a new revision to make sure there are no warnings but I don’t want this shipped in clang-10 and a revert is something that we can easily backport unless you can come up with a fix rather quickly.

Reverted in the master branch.

vhscampos reopened this revision.Feb 8 2020, 5:27 AM

This revision is now accepted and ready to land.Feb 8 2020, 5:27 AM

vhscampos planned changes to this revision.Feb 8 2020, 5:28 AM

In D70072#1865561, @vhscampos wrote:

Reverted in the master branch.

I see it was also reverted on the release/10.x branch 7996b49053f0508717f4a081d197ddc3073f4b5f. Thanks for keeping the branch in mind, but as mentioned in http://lists.llvm.org/pipermail/llvm-dev/2020-January/138295.html please check with me before pushing directly to it.

alanphipps added a subscriber: alanphipps.Feb 10 2020, 2:49 PM

Changes:

Explictly avoid using the register-offset variant of LDRD/STRD. This variant has a limitation on register allocation: the register allocated to the register-offset cannot be reused in any of the remaining operands. I could not find an easy way to implement this in LLVM, so I left it as a to-do in the future.
Instruction selection of STRD was moved from TableGen to C++ because of point (1).
Updated tests to reflect these changes.

This revision is now accepted and ready to land.Mar 10 2020, 10:09 AM

Herald added a subscriber: danielkiss. · View Herald TranscriptMar 10 2020, 10:09 AM

Can I please have this reviewed again? I have addressed the issues reported.

Harbormaster failed remote builds in B48711: Diff 249434!Mar 10 2020, 10:53 AM

Boot tested a clang built arm32 linux kernel in QEMU with this patch applied. Thanks for following up with fixes.

This revision is now accepted and ready to land.Mar 10 2020, 1:05 PM

LGTM with one minor comment.

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
3622 ↗	(On Diff #249434)	The RegOffset check could use a comment explaining what it's doing, here and for STRD.

Added comment to explain the fallback to non-register-offset variants.

Closed by commit rG8a1255322318: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyMar 11 2020, 3:47 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B48792: Diff 249573!Mar 11 2020, 4:30 AM

vhscampos reopened this revision.May 27 2020, 7:56 AM

This revision is now accepted and ready to land.May 27 2020, 7:56 AM

Improve the testcase which exercises loads and stores from stack. Now, wrong frame index replacements will be caught here.

vhscampos requested review of this revision.May 27 2020, 8:01 AM

Harbormaster failed remote builds in B58058: Diff 266545!May 27 2020, 9:11 AM

LGTM

This revision is now accepted and ready to land.May 27 2020, 4:10 PM

Closed by commit rGc010d4d19550: [ARM] Improve codegen of volatile load/store of i64 (authored by vhscampos). · Explain WhyMay 28 2020, 3:13 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.h

2 lines

ARMISelLowering.cpp

78 lines

test/

CodeGen/

ARM/

i64_volatile_load_store.ll

30 lines

Diff 228663

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 725 Lines • ▼ Show 20 Lines	private:
SDValue LowerREM(SDNode *N, SelectionDAG &DAG) const;		SDValue LowerREM(SDNode *N, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
void lowerABS(SDNode *N, SmallVectorImpl<SDValue> &Results,		void lowerABS(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		void LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) const;

Register getRegisterByName(const char* RegName, EVT VT,		Register getRegisterByName(const char* RegName, EVT VT,
const MachineFunction &MF) const override;		const MachineFunction &MF) const override;

SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
SmallVectorImpl<SDNode *> &Created) const override;		SmallVectorImpl<SDNode *> &Created) const override;

bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;		bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,044 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,

setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL, MVT::i64, Custom);		setOperationAction(ISD::SRL, MVT::i64, Custom);
setOperationAction(ISD::SRA, MVT::i64, Custom);		setOperationAction(ISD::SRA, MVT::i64, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i64, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i64, Custom);
		setOperationAction(ISD::LOAD, MVT::i64, Custom);
		setOperationAction(ISD::STORE, MVT::i64, Custom);

// MVE lowers 64 bit shifts to lsll and lsrl		// MVE lowers 64 bit shifts to lsll and lsrl
// assuming that ISD::SRL and SRA of i64 are already marked custom		// assuming that ISD::SRL and SRA of i64 are already marked custom
if (Subtarget->hasMVEIntegerOps())		if (Subtarget->hasMVEIntegerOps())
setOperationAction(ISD::SHL, MVT::i64, Custom);		setOperationAction(ISD::SHL, MVT::i64, Custom);

// Expand to __aeabi_l{lsl,lsr,asr} calls for Thumb1.		// Expand to __aeabi_l{lsl,lsr,asr} calls for Thumb1.
if (Subtarget->isThumb1Only()) {		if (Subtarget->isThumb1Only()) {
▲ Show 20 Lines • Show All 7,891 Lines • ▼ Show 20 Lines	SDValue Load = DAG.getExtLoad(
LD->getMemOperand());		LD->getMemOperand());
SDValue Pred = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::v16i1, Load);		SDValue Pred = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::v16i1, Load);
if (MemVT != MVT::v16i1)		if (MemVT != MVT::v16i1)
Pred = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MemVT, Pred,		Pred = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MemVT, Pred,
DAG.getConstant(0, dl, MVT::i32));		DAG.getConstant(0, dl, MVT::i32));
return DAG.getMergeValues({Pred, Load.getValue(1)}, dl);		return DAG.getMergeValues({Pred, Load.getValue(1)}, dl);
}		}

		void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) const {
		LoadSDNode *LD = cast<LoadSDNode>(N);
		EVT MemVT = LD->getMemoryVT();

		if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
		!Subtarget->isThumb1Only() &&
		LD->getExtensionType() == ISD::NON_EXTLOAD && LD->isVolatile()) {
		efriedmaUnsubmitted Not Done Reply Inline Actions I'd prefer not to exclude extending loads here. Could lead to weird cases where we miss the transform. efriedma: I'd prefer not to exclude extending loads here. Could lead to weird cases where we miss the…
		efriedmaUnsubmitted Not Done Reply Inline Actions You still need to handle extending loads somehow... Actually, maybe we don't mess with volatile loads in DAGCombine, and you don't need to implement it. In that case, it would still be nice to have an assertion, in case someone changes it at some point. efriedma: You still need to handle extending loads somehow... Actually, maybe we don't mess with…
		vhscamposAuthorUnsubmitted Done Reply Inline Actions Your last comment on this wasn't clear to me. If after the latest change you still want me to add anything, please let me know. vhscampos: Your last comment on this wasn't clear to me. If after the latest change you still want me to…
		SDLoc dl(N);
		const SDValue &Offset = LD->isIndexed()
		? LD->getOffset()
		: DAG.getTargetConstant(0, dl, MVT::i32);
		unsigned OpCode = Subtarget->isThumb2() ? ARM::t2LDRDi8 : ARM::LDRD;
		SmallVector<SDValue, 6> Ops = {LD->getBasePtr()};
		if (!Subtarget->isThumb2()) {
		Ops.push_back(DAG.getRegister(0, MVT::i32));
		}
		Ops.append({Offset,
		DAG.getTargetConstant((uint64_t)ARMCC::AL, dl, MVT::i32),
		DAG.getRegister(0, MVT::i32), LD->getChain()});
		MachineSDNode *Result =
		DAG.getMachineNode(OpCode, dl, {MVT::i32, MVT::i32, MVT::Other}, Ops);
		efriedmaUnsubmitted Not Done Reply Inline Actions Please don't make MachineSDNodes this early; it might appear to mostly work, but other code is not expecting MachineSDNodes at this point. See ARMISelLowering.h for how to introduce a target-specific SDNode. efriedma: Please don't make MachineSDNodes this early; it might appear to mostly work, but other code is…
		MachineMemOperand *MemOp = cast<MemSDNode>(N)->getMemOperand();
		DAG.setNodeMemRefs(cast<MachineSDNode>(Result), {MemOp});
		SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64,
		SDValue(Result, 0), SDValue(Result, 1));
		Results.append({Pair, SDValue(Result, 2) /* Chain */});
		}
		}

static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {
StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());		StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
EVT MemVT = ST->getMemoryVT();		EVT MemVT = ST->getMemoryVT();
assert((MemVT == MVT::v4i1 \|\| MemVT == MVT::v8i1 \|\| MemVT == MVT::v16i1) &&		assert((MemVT == MVT::v4i1 \|\| MemVT == MVT::v8i1 \|\| MemVT == MVT::v16i1) &&
"Expected a predicate type!");		"Expected a predicate type!");
assert(MemVT == ST->getValue().getValueType());		assert(MemVT == ST->getValue().getValueType());
assert(!ST->isTruncatingStore() && "Expected a non-extending store");		assert(!ST->isTruncatingStore() && "Expected a non-extending store");
assert(ST->isUnindexed() && "Expected a unindexed store");		assert(ST->isUnindexed() && "Expected a unindexed store");
Show All 13 Lines	static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {
}		}
SDValue GRP = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::i32, Build);		SDValue GRP = DAG.getNode(ARMISD::PREDICATE_CAST, dl, MVT::i32, Build);
return DAG.getTruncStore(		return DAG.getTruncStore(
ST->getChain(), dl, GRP, ST->getBasePtr(),		ST->getChain(), dl, GRP, ST->getBasePtr(),
EVT::getIntegerVT(*DAG.getContext(), MemVT.getSizeInBits()),		EVT::getIntegerVT(*DAG.getContext(), MemVT.getSizeInBits()),
ST->getMemOperand());		ST->getMemOperand());
}		}

		static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
		const ARMSubtarget *Subtarget) {
		StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
		EVT MemVT = ST->getMemoryVT();

		if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
		!Subtarget->isThumb1Only() && !ST->isTruncatingStore() &&
		efriedmaUnsubmitted Not Done Reply Inline Actions We also don't want to restrict truncating stores. efriedma: We also don't want to restrict truncating stores.
		ST->isVolatile()) {
		SDNode *N = Op.getNode();
		SDLoc dl(N);

		SDValue Lo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, ST->getValue(),
		DAG.getTargetConstant(0, dl, MVT::i32));
		SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, ST->getValue(),
		DAG.getTargetConstant(1, dl, MVT::i32));
		const SDValue &Offset = ST->isIndexed()
		? ST->getOffset()
		: DAG.getTargetConstant(0, dl, MVT::i32);
		unsigned OpCode = Subtarget->isThumb2() ? ARM::t2STRDi8 : ARM::STRD;
		efriedmaUnsubmitted Not Done Reply Inline Actions Loads and stores should not have an "Offset" at this point; we don't form pre/post-indexed operations until after legalization. Maybe worth asserting "isUnindexed()". (Same for loads.) efriedma: Loads and stores should not have an "Offset" at this point; we don't form pre/post-indexed…
		SmallVector<SDValue, 8> Ops = {Lo, Hi, ST->getBasePtr()};
		if (!Subtarget->isThumb2()) {
		dmgreenUnsubmitted Not Done Reply Inline Actions Likewise, this can just say if (Subtarget->hasMVEIntegerOps() && (VT == MVT::v4i1 \|\| VT == MVT::v8i1 \|\| VT == MVT::v16i1)) The other cases (isTruncatingStore and isUnindexed) should not come up. If they do we should noisily fail with an assert. dmgreen: Likewise, this can just say if (Subtarget->hasMVEIntegerOps() && (VT == MVT::v4i1 \|\| VT == MVT…
		Ops.push_back(DAG.getRegister(0, MVT::i32));
		}
		Ops.append({Offset,
		DAG.getTargetConstant((uint64_t)ARMCC::AL, dl, MVT::i32),
		DAG.getRegister(0, MVT::i32), ST->getChain()});
		MachineSDNode *Result = DAG.getMachineNode(OpCode, dl, MVT::Other, Ops);
		MachineMemOperand *MemOp = cast<MemSDNode>(N)->getMemOperand();
		DAG.setNodeMemRefs(cast<MachineSDNode>(Result), {MemOp});

		return SDValue(Result, 0);
		} else if ((MemVT == MVT::v4i1 \|\| MemVT == MVT::v8i1 \|\|
		MemVT == MVT::v16i1) &&
		!ST->isTruncatingStore() && ST->isUnindexed()) {
		return LowerPredicateStore(Op, DAG);
		}

		return SDValue();
		}

static SDValue LowerMLOAD(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerMLOAD(SDValue Op, SelectionDAG &DAG) {
MaskedLoadSDNode *N = cast<MaskedLoadSDNode>(Op.getNode());		MaskedLoadSDNode *N = cast<MaskedLoadSDNode>(Op.getNode());
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
SDValue Mask = N->getMask();		SDValue Mask = N->getMask();
SDValue PassThru = N->getPassThru();		SDValue PassThru = N->getPassThru();
SDLoc dl(Op);		SDLoc dl(Op);

auto IsZero = [](SDValue PassThru) {		auto IsZero = [](SDValue PassThru) {
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::USUBO:		case ISD::USUBO:
return LowerUnsignedALUO(Op, DAG);		return LowerUnsignedALUO(Op, DAG);
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
return LowerSADDSUBSAT(Op, DAG, Subtarget);		return LowerSADDSUBSAT(Op, DAG, Subtarget);
case ISD::LOAD:		case ISD::LOAD:
return LowerPredicateLoad(Op, DAG);		return LowerPredicateLoad(Op, DAG);
case ISD::STORE:		case ISD::STORE:
return LowerPredicateStore(Op, DAG);		return LowerSTORE(Op, DAG, Subtarget);
		dmgreenUnsubmitted Not Done Reply Inline Actions How come this is altered, but not the LowerPredicateLoad? dmgreen: How come this is altered, but not the LowerPredicateLoad?
		vhscamposAuthorUnsubmitted Done Reply Inline Actions The custom lowering of loads and stores here is triggered by the DAG Type Legalizer, since i64 is not supported. In DAGTypeLegalizer::CustomLowerNode(), custom lowering of loads is directed to ARMTargetLowering::ReplaceNodeResults(), which then calls LowerLOAD(), created in the present patch. The lowering of stores is the one that is directed to ARMTargetLowering::LowerOperation(). In summary, the custom lowering of loads because of illegal result types does not go through here, so I believe there's no need to have it changed in this point. vhscampos: The custom lowering of loads and stores here is triggered by the DAG Type Legalizer, since i64…
case ISD::MLOAD:		case ISD::MLOAD:
return LowerMLOAD(Op, DAG);		return LowerMLOAD(Op, DAG);
case ISD::ATOMIC_LOAD:		case ISD::ATOMIC_LOAD:
case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG);		case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, DAG);
case ISD::SDIVREM:		case ISD::SDIVREM:
case ISD::UDIVREM: return LowerDivRem(Op, DAG);		case ISD::UDIVREM: return LowerDivRem(Op, DAG);
case ISD::DYNAMIC_STACKALLOC:		case ISD::DYNAMIC_STACKALLOC:
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
ReplaceCMP_SWAP_64Results(N, Results, DAG);		ReplaceCMP_SWAP_64Results(N, Results, DAG);
return;		return;
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return ReplaceLongIntrinsic(N, Results, DAG);		return ReplaceLongIntrinsic(N, Results, DAG);
case ISD::ABS:		case ISD::ABS:
lowerABS(N, Results, DAG);		lowerABS(N, Results, DAG);
return ;		return ;
		case ISD::LOAD:
		LowerLOAD(N, Results, DAG);
		break;
}		}
if (Res.getNode())		if (Res.getNode())
Results.push_back(Res);		Results.push_back(Res);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM Scheduler Hooks		// ARM Scheduler Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 1,385 Lines • ▼ Show 20 Lines	if (MI.getOpcode() == ARM::MEMCPY) {
attachMEMCPYScratchRegs(Subtarget, MI, Node);		attachMEMCPYScratchRegs(Subtarget, MI, Node);
return;		return;
}		}

const MCInstrDesc *MCID = &MI.getDesc();		const MCInstrDesc *MCID = &MI.getDesc();
// Adjust potentially 's' setting instructions after isel, i.e. ADC, SBC, RSB,		// Adjust potentially 's' setting instructions after isel, i.e. ADC, SBC, RSB,
// RSC. Coming out of isel, they have an implicit CPSR def, but the optional		// RSC. Coming out of isel, they have an implicit CPSR def, but the optional
// operand is still set to noreg. If needed, set the optional operand's		// operand is still set to noreg. If needed, set the optional operand's
// register to CPSR, and remove the redundant implicit def.		// register to CPSR, and remove the redundant implicit def.
		efriedmaUnsubmitted Not Done Reply Inline Actions Oh, I didn't notice this before. There's a problem here: if we're trying to generate LDRD for the sake of the extra guarantees provided by the some targets, transforming that to LDM is wrong; it doesn't have the same guarantee, and therefore could cause unpredictable, subtle problems. Probably we want to allocate a GPRPair, instead of allocating two separate registers and trying to tie them together with a hint. Maybe requires defining a new pseudo-instruction that takes a GPRPair instead of two GPRs. Or I'd be okay with just restricting the optimization to Thumb2 for now, if you don't want to do the extra work right now. efriedma: Oh, I didn't notice this before. There's a problem here: if we're trying to generate LDRD for…
//		//
// e.g. ADCS (..., implicit-def CPSR) -> ADC (... opt:def CPSR).		// e.g. ADCS (..., implicit-def CPSR) -> ADC (... opt:def CPSR).

// Rename pseudo opcodes.		// Rename pseudo opcodes.
unsigned NewOpc = convertAddSubFlagsOpcode(MI.getOpcode());		unsigned NewOpc = convertAddSubFlagsOpcode(MI.getOpcode());
unsigned ccOutIdx;		unsigned ccOutIdx;
if (NewOpc) {		if (NewOpc) {
const ARMBaseInstrInfo *TII = Subtarget->getInstrInfo();		const ARMBaseInstrInfo *TII = Subtarget->getInstrInfo();
▲ Show 20 Lines • Show All 6,435 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

This file was added.

				; RUN: llc -mtriple=armv5e-arm-none-eabi %s -o - \| FileCheck %s --check-prefix=CHECK-ARMV5TE
				; RUN: llc -mtriple=thumbv6t2-arm-none-eabi %s -o - \| FileCheck %s --check-prefix=CHECK-T2
				; RUN: llc -mtriple=armv4t-arm-none-eabi %s -o - \| FileCheck %s --check-prefix=CHECK-ARMV4T

				@x = common dso_local global i64 0, align 8
				@y = common dso_local global i64 0, align 8

				define void @test() {
				entry:
				; CHECK-ARMV5TE: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV5TE: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV5TE: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV5TE: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-T2: movw [[ADDR0:r[0-9]+]], :lower16:x
				; CHECK-T2: movw [[ADDR1:r[0-9]+]], :lower16:y
				; CHECK-T2: movt [[ADDR0]], :upper16:x
				; CHECK-T2: movt [[ADDR1]], :upper16:y
				; CHECK-T2: ldrd [[R0:r[0-9]+]], [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-T2: strd [[R0]], [[R1]], {{\[}}[[ADDR1]]]
				; CHECK-ARMV4T: ldr [[ADDR0:r[0-9]+]]
				; CHECK-ARMV4T: ldr [[ADDR1:r[0-9]+]]
				; CHECK-ARMV4T: ldr [[R1:r[0-9]+]], {{\[}}[[ADDR0]]]
				; CHECK-ARMV4T: ldr [[R0:r[0-9]+]], {{\[}}[[ADDR0]], #4]
				; CHECK-ARMV4T: str [[R0]], {{\[}}[[ADDR1]], #4]
				; CHECK-ARMV4T: str [[R1]], {{\[}}[[ADDR1]]]
				%0 = load volatile i64, i64* @x, align 8
				store volatile i64 %0, i64* @y, align 8
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve codegen of volatile load/store of i64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 228663

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/test/CodeGen/ARM/i64_volatile_load_store.ll

[ARM] Improve codegen of volatile load/store of i64
ClosedPublic