This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
10
ARMISelDAGToDAG.cpp
2
ARMISelLowering.cpp
-
ARMInstrThumb.td
-
test/CodeGen/Thumb/
-
CodeGen/
-
Thumb/
4
long.ll

Differential D30400

For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as already done for ARM and Thumb2.
ClosedPublic

Authored by tyomitch on Feb 27 2017, 3:41 AM.

Download Raw Diff

Details

Reviewers

jmolloy
rogfer01
efriedma

Commits

rG0c93ceb5d85b: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as…
rL297443: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Summary

Unfortunately, due to the Thumb1 idiosyncrasy where the instructions
can be *either* flag-setting *or* conditional, this is not expressible
with TableGen patterns, so we have to go for the custom C++ lowering.

Diff Detail

Build Status

Buildable 4364
Build 4364: arc lint + arc unit

Event Timeline

tyomitch created this revision.Feb 27 2017, 3:41 AM

Harbormaster completed remote builds in B4315: Diff 89861.Feb 27 2017, 3:41 AM

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptFeb 27 2017, 3:41 AM

tyomitch added a child revision: D30401: Refactor the multiply-accumulate combines to act on ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE]..Feb 27 2017, 4:14 AM

efriedma added a subscriber: efriedma.Feb 27 2017, 2:17 PM

efriedma added inline comments.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3306	This assertion seems suspicious... why is it true in general?

Thanks Eli!
Indeed the assertion was wrong; this also shows how insufficient our tests for long adds/subracts were.
Updating the patch to address both these points.

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Few nits, but otherwise, looks good.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3240	Why can't you leave this as an early break?
3272	use LLVM_FALLTHROUGH

tyomitch added inline comments.Feb 28 2017, 6:36 AM

lib/Target/ARM/ARMISelDAGToDAG.cpp
3240	Exactly because I want it to fall through to the next case, if the condition doesn't hold.

Ok, just adding LLVM_FALLTHROUGH should be fine for me. I'll let Eli have a final look and approve.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3240	right, I thought as much.

Added LLVM_FALLTHROUGH

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Harbormaster completed remote builds in B4364: Diff 90031.Feb 28 2017, 7:35 AM

Are you sure we can't use the same codepath we currently use for Thumb2/ARM here?

test/CodeGen/Thumb/long.ll
80	I'd also like to see some tests here for subtraction with an immediate amount. ("add i64 %y, -10" etc.)

Are you sure we can't use the same codepath we currently use for Thumb2/ARM here?

I don't think we can.
The existing codepath is itself quite hairy: quoting a comment in ARMInstrInfo.td,

// Currently, ADDS/SUBS are pseudo opcodes that exist only in the
// selection DAG. They are "lowered" to real ADD/SUB opcodes by
// AdjustInstrPostInstrSelection where we determine whether or not to
// set the "s" bit based on CPSR liveness.
//
// FIXME: Eliminate ADDS/SUBS pseudo opcodes after adding tablegen
// support for an optional CPSR definition that corresponds to the DAG
// node's second value. We can then eliminate the implicit def of CPSR.

For the Thumb1 instructions, we cannot choose "whether or not to set the "s" bit"; it's implicitly set iff the instruction isn't predicated.

For the Thumb1 instructions, we cannot choose "whether or not to set the "s" bit"; it's implicitly set iff the instruction isn't predicated.

I think it works out anyway; outside of Thumb1 mode, we want to avoid clobbering CPSR when we don't need to, but it's perfectly legal to produce a dead definition of CPSR.

clobbering CPSR when we don't need to is the least of the problems; what we have in ARM and Thumb2 is that ADD and ADDS are defined separately, the former producing one result (to match an ADD node), and the latter producing two (to match an ADDC node). In Thumb1, we cannot define them separately, so tADD MIs are defined with an OptionalDef for CPSR. The ISel patterns won't let me match an MI with one result value (and an OptionalDef) to an ISD node producing two results. Redefining tADD to always produce two results doesn't work either, because it's assumed, by many layers including AsmParser / AsmPrinter, to still have the OptionalDef for CPSR; and the InstrEmitter won't let me have CPSR as both an OptionalDef and an actual result in the same MI.
Handwave handwave, I cannot really prove that it cannot be done, but I mean I had tried, and I couldn't.

test/CodeGen/Thumb/long.ll
80	Indeed, subtracting immediates wasn't handled well; I'll upload the updated patch.

Patch updated

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

In Thumb1, we cannot define them separately

Why not? "t2ADDSrr" is a pseudo-instruction, not an actual encoding.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3318	The old patterns don't handle SUBC with an immediate. You can produce this situation with something like this: long long x(long long a, int b) { return a - (((long long)b << 32) \| -1U); } I think the handling here is correct, but please change it in a separate patch.

Why not? "t2ADDSrr" is a pseudo-instruction, not an actual encoding.

Right; but t2ADC / t2SBC are actual encodings (non-predicable, with non-optional def for CPSR), unlike tADC / tSBC (predicable, with an OptionalDef for CPSR).

It might be possible to do a hybrid implementation, using tPseudoInsts for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC; although this feels like, out of two evils, choosing both.
It would also require duplicating a substantial portion of the code in ARMTargetLowering::AdjustInstrPostInstrSelection to take care of Thumb1 instructions separately, because their MIs have a different structure: in particular, the cc_out operand that AdjustInstrPostInstrSelection is adding must, in Thumb1 instructions, be not last but 1st (and MachineInstr doesn't even have an API to insert a new operand into the middle of an existing instruction).

lib/Target/ARM/ARMISelDAGToDAG.cpp
3318	The old patterns lower this code into: movs r3, #0 mvns r3, r3 subs r0, r0, r3 sbcs r1, r2 on Thumb1, and into much more compact subs.w r0, r0, #-1 sbcs r1, r2 on Thumb2. The new code lowers it into adds r0, r0, #1 sbcs r1, r2 which is equivalent, and even a bit more compact. I don't really see what the problem is, either with the old patterns or with the new code.

t2ADC / t2SBC are actual encodings (non-predicable

t2ADC should be predicable? At least, there isn't any restriction imposed by the architecture.

The rest makes sense; I'll stop pushing.

I don't really see what the problem is, either with the old patterns or with the new code.

I don't really want to mix multiple orthogonal changes, especially without any test coverage.

t2ADC should be predicable?

I'd think so too! As you see, long addition/subtraction is not the neatest part of the ARM backend :-)

I don't really want to mix multiple orthogonal changes, especially without any test coverage.

Right, now I see what you mean.

These changes are rather intertwined (it's easier to handle both ADDC and SUBC in the same branch of the switch block, than to separate them and faithfully replicate the old behaviour) but I will certainly add a test case for SUBC with immediate.

Added tests for SUBC with immediate

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Select(RHS.getNode()) must be deferred until RHS has users; otherwise, if Select() converts RHS into a duplicate of an existing node, then the DAG automatically updates all uses of RHS to use the existing node instead, and deletes the RHS's own node.
If we call Select(RHS.getNode()) when RHS doesn't yet have any users, then nothing gets updated, RHS's node gets deleted, and we end up adding uses to a deleted node. Boom!

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

efriedma added inline comments.Mar 3 2017, 11:40 AM

lib/Target/ARM/ARMISelDAGToDAG.cpp
3299	Do you actually need to call Select() explicitly here? Instruction selection should pick it up automatically, I think.

tyomitch added inline comments.Mar 3 2017, 12:28 PM

lib/Target/ARM/ARMISelDAGToDAG.cpp
3299	No, it doesn't re-lower nodes created by `ARMDAGToDAGISel::Select()`: it is assumed to only output lowered nodes.

efriedma added inline comments.Mar 3 2017, 12:44 PM

lib/Target/ARM/ARMISelDAGToDAG.cpp
3299	Okay. The lowering for ISD::AND has some code which deals with a similar situation, but in a different way. Could you refactor to share the same code?

Copying the trick that the lowering for ISD::AND uses to create and lower a constant node

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

tyomitch added inline comments.Mar 7 2017, 3:00 PM

test/CodeGen/Thumb/long.ll
56	Now I see that lowering an `(ADDE x, y, (ADDC z, t))` into a chain of `(CopyFromReg CPSR, (tADD z, t)), (CopyFromReg CPSR, (tADC x, y, (CopyToReg CPSR)))`, with the CPSR-copying nodes glued to the arithmetic nodes, -- doesn't prevent LLVM from scheduling CPSR-clobbering operations in between the converted ADDC and the converted ADDE, -- such as in this test case, where a flag-setting tMOVi8 is inserted in the middle. An ugly patch is certainly better than an incorrect one, so I decided to go back and finish the "hybrid implementation" using tPseudoInsts with two integer outputs each for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC.

Hybrid implementation

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Hmm... given that you've done most of the work of fixing AdjustInstrPostInstrSelection, how hard would it be to add tADCS/tSBCS pseudo-instructions and send them through AdjustInstrPostInstrSelection, as opposed to using custom selection code in C++? I'm sort of concerned you could run into the same scheduling problem for 128-bit addition.

Adding tADCS/tSBCS pseudo-instructions does indeed let
simplify the custom selection code quite a bit, but
doesn't get rid of it entirely, as the negative-immediate
operand still needs a "recursive lowering" which cannot
be specified with ISel patterns. (This is similar to how
ISD::AND needs the custom lowering into a tBIC.)

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

but doesn't get rid of it entirely, as the negative-immediate operand still needs a "recursive lowering" which cannot be specified with ISel patterns.

Could you do this as a DAGCombine instead?

Lowering the negative-immediate operand as a DAGCombine instead

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

LGTM, with a few minor tweaks.

lib/Target/ARM/ARMISelLowering.cpp
9072	Check isThumb1Only() rather than MCID->getSize()?
9102	Check isThumb1Only() rather than MCID->getSize()?
test/CodeGen/Thumb/long.ll
5	Add -verify-machineinstrs to the RUN lines.

This revision is now accepted and ready to land.Mar 9 2017, 3:30 PM

The minor tweaks

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Closed by commit rL297443: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, (authored by askrobov). · Explain WhyMar 9 2017, 11:52 PM

This revision was automatically updated to reflect the committed changes.

tyomitch mentioned this in D30401: Refactor the multiply-accumulate combines to act on ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE]..Mar 10 2017, 1:01 AM

tyomitch mentioned this in D31081: [ARM] ScheduleDAGRRList::DelayForLiveRegsBottomUp must consider OptionalDefs.Mar 17 2017, 10:47 AM

Diffusion mentioned this in rL301106: [ARM] ScheduleDAGRRList::DelayForLiveRegsBottomUp must consider OptionalDefs.Apr 23 2017, 12:11 AM

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelDAGToDAG.cpp

119 lines

ARMISelLowering.cpp

11 lines

ARMInstrThumb.td

20 lines

test/

CodeGen/

Thumb/

long.ll

61 lines

Diff 90031

lib/Target/ARM/ARMISelDAGToDAG.cpp

Show First 20 Lines • Show All 3,158 Lines • ▼ Show 20 Lines	if (Subtarget->isThumb()) {
CurDAG->getRegister(0, MVT::i32),		CurDAG->getRegister(0, MVT::i32),
CurDAG->getRegister(0, MVT::i32) };		CurDAG->getRegister(0, MVT::i32) };
ReplaceNode(N, CurDAG->getMachineNode(		ReplaceNode(N, CurDAG->getMachineNode(
Subtarget->hasV6Ops() ? ARM::SMLAL : ARM::SMLALv5, dl,		Subtarget->hasV6Ops() ? ARM::SMLAL : ARM::SMLALv5, dl,
MVT::i32, MVT::i32, Ops));		MVT::i32, MVT::i32, Ops));
return;		return;
}		}
}		}
case ARMISD::SUBE: {
if (!Subtarget->hasV6Ops())
break;
// Look for a pattern to match SMMLS
// (sube a, (smul_loHi a, b), (subc 0, (smul_LOhi(a, b))))
if (N->getOperand(1).getOpcode() != ISD::SMUL_LOHI \|\|
N->getOperand(2).getOpcode() != ARMISD::SUBC \|\|
!SDValue(N, 1).use_empty())
break;

if (Subtarget->isThumb())
assert(Subtarget->hasThumb2() &&
"This pattern should not be generated for Thumb");

SDValue SmulLoHi = N->getOperand(1);
SDValue Subc = N->getOperand(2);
auto *Zero = dyn_cast<ConstantSDNode>(Subc.getOperand(0));

if (!Zero \|\| Zero->getZExtValue() != 0 \|\|
Subc.getOperand(1) != SmulLoHi.getValue(0) \|\|
N->getOperand(1) != SmulLoHi.getValue(1) \|\|
N->getOperand(2) != Subc.getValue(1))
break;

unsigned Opc = Subtarget->isThumb2() ? ARM::t2SMMLS : ARM::SMMLS;
SDValue Ops[] = { SmulLoHi.getOperand(0), SmulLoHi.getOperand(1),
N->getOperand(0), getAL(CurDAG, dl),
CurDAG->getRegister(0, MVT::i32) };
ReplaceNode(N, CurDAG->getMachineNode(Opc, dl, MVT::i32, Ops));
return;
}
case ISD::LOAD: {		case ISD::LOAD: {
if (Subtarget->isThumb() && Subtarget->hasThumb2()) {		if (Subtarget->isThumb() && Subtarget->hasThumb2()) {
if (tryT2IndexedLoad(N))		if (tryT2IndexedLoad(N))
return;		return;
} else if (Subtarget->isThumb()) {		} else if (Subtarget->isThumb()) {
if (tryT1IndexedLoad(N))		if (tryT1IndexedLoad(N))
return;		return;
} else if (tryARMIndexedLoad(N))		} else if (tryARMIndexedLoad(N))
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	if (N->getNumValues() == 2) {
ReplaceUses(SDValue(N, 1), InFlag);		ReplaceUses(SDValue(N, 1), InFlag);
}		}
ReplaceUses(SDValue(N, 0),		ReplaceUses(SDValue(N, 0),
SDValue(Chain.getNode(), Chain.getResNo()));		SDValue(Chain.getNode(), Chain.getResNo()));
CurDAG->RemoveDeadNode(N);		CurDAG->RemoveDeadNode(N);
return;		return;
}		}


		case ARMISD::SUBE:
		if (Subtarget->hasV6Ops()) {
		rengolinUnsubmitted Not Done Reply Inline Actions Why can't you leave this as an early break? rengolin: Why can't you leave this as an early break?
		tyomitchAuthorUnsubmitted Not Done Reply Inline Actions Exactly because I want it to fall through to the next case, if the condition doesn't hold. tyomitch: Exactly because I want it to fall through to the next case, if the condition doesn't hold.
		rengolinUnsubmitted Not Done Reply Inline Actions right, I thought as much. rengolin: right, I thought as much.

		// Look for a pattern to match SMMLS
		// (sube a, (smul_loHi a, b), (subc 0, (smul_LOhi(a, b))))
		if (N->getOperand(1).getOpcode() == ISD::SMUL_LOHI &&
		N->getOperand(2).getOpcode() == ARMISD::SUBC &&
		SDValue(N, 1).use_empty())
		{

		if (Subtarget->isThumb())
		assert(Subtarget->hasThumb2() &&
		"This pattern should not be generated for Thumb");

		SDValue SmulLoHi = N->getOperand(1);
		SDValue Subc = N->getOperand(2);
		auto *Zero = dyn_cast<ConstantSDNode>(Subc.getOperand(0));

		if (Zero && Zero->getZExtValue() == 0 &&
		Subc.getOperand(1) == SmulLoHi.getValue(0) &&
		N->getOperand(1) == SmulLoHi.getValue(1) &&
		N->getOperand(2) == Subc.getValue(1))
		{

		unsigned Opc = Subtarget->isThumb2() ? ARM::t2SMMLS : ARM::SMMLS;
		SDValue Ops[] = { SmulLoHi.getOperand(0), SmulLoHi.getOperand(1),
		N->getOperand(0), getAL(CurDAG, dl),
		CurDAG->getRegister(0, MVT::i32) };
		ReplaceNode(N, CurDAG->getMachineNode(Opc, dl, MVT::i32, Ops));
		return;
		}
		}
		}
		LLVM_FALLTHROUGH;
		rengolinUnsubmitted Not Done Reply Inline Actions use LLVM_FALLTHROUGH rengolin: use LLVM_FALLTHROUGH

		case ARMISD::ADDE:
		if (Subtarget->isThumb1Only()) {
		unsigned Opc = (N->getOpcode() == ARMISD::ADDE) ? ARM::tADC : ARM::tSBC;
		SDValue Carry = N->getOperand(2),
		GlueIn = CurDAG->getCopyToReg(Carry, dl, ARM::CPSR,
		Carry, SDValue()).getValue(1),
		Ops[] = {CurDAG->getRegister(ARM::CPSR, MVT::i32),
		N->getOperand(0), N->getOperand(1),
		getAL(CurDAG, dl), CurDAG->getRegister(0, MVT::i32),
		GlueIn},
		Res(CurDAG->getMachineNode(Opc, dl, MVT::i32, MVT::Glue, Ops), 0),
		GlueOut = CurDAG->getCopyFromReg(Res, dl, ARM::CPSR,
		MVT::i32, Res.getValue(1)),
		Replacement[] = {Res, GlueOut};
		CurDAG->ReplaceAllUsesWith(N, Replacement);
		CurDAG->RemoveDeadNode(N);
		return;
		}
		// Other cases are autogenerated.
		break;

		case ARMISD::ADDC:
		case ARMISD::SUBC:
		if (Subtarget->isThumb1Only()) {
		bool isAdd = N->getOpcode() == ARMISD::ADDC;
		unsigned Opc = isAdd ? ARM::tADDrr : ARM::tSUBrr;
		efriedmaUnsubmitted Not Done Reply Inline Actions Do you actually need to call Select() explicitly here? Instruction selection should pick it up automatically, I think. efriedma: Do you actually need to call Select() explicitly here? Instruction selection should pick it up…
		tyomitchAuthorUnsubmitted Not Done Reply Inline Actions No, it doesn't re-lower nodes created by `ARMDAGToDAGISel::Select()`: it is assumed to only output lowered nodes. tyomitch: No, it doesn't re-lower nodes created by `ARMDAGToDAGISel::Select()`: it is assumed to only…
		efriedmaUnsubmitted Not Done Reply Inline Actions Okay. The lowering for ISD::AND has some code which deals with a similar situation, but in a different way. Could you refactor to share the same code? efriedma: Okay. The lowering for ISD::AND has some code which deals with a similar situation, but in a…
		SDValue RHS = N->getOperand(1);
		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(RHS)) {
		uint64_t imm = C->getZExtValue();
		if (imm < 256) {
		if (imm < 8)
		Opc = isAdd ? ARM::tADDi3 : ARM::tSUBi3;
		else
		efriedmaUnsubmitted Not Done Reply Inline Actions This assertion seems suspicious... why is it true in general? efriedma: This assertion seems suspicious... why is it true in general?
		Opc = isAdd ? ARM::tADDi8 : ARM::tSUBi8;
		RHS = CurDAG->getTargetConstant(imm, dl, MVT::i32);
		}
		}

		SDValue Ops[] = {CurDAG->getRegister(ARM::CPSR, MVT::i32),
		N->getOperand(0), RHS,
		getAL(CurDAG, dl), CurDAG->getRegister(0, MVT::i32)},
		Res(CurDAG->getMachineNode(Opc, dl, MVT::i32, MVT::Glue, Ops), 0),
		Glue = CurDAG->getCopyFromReg(Res, dl, ARM::CPSR,
		MVT::i32, Res.getValue(1)),
		Replacement[] = {Res, Glue};
		efriedmaUnsubmitted Not Done Reply Inline Actions The old patterns don't handle SUBC with an immediate. You can produce this situation with something like this: long long x(long long a, int b) { return a - (((long long)b << 32) \| -1U); } I think the handling here is correct, but please change it in a separate patch. efriedma: The old patterns don't handle SUBC with an immediate. You can produce this situation with…
		tyomitchAuthorUnsubmitted Not Done Reply Inline Actions The old patterns lower this code into: movs r3, #0 mvns r3, r3 subs r0, r0, r3 sbcs r1, r2 on Thumb1, and into much more compact subs.w r0, r0, #-1 sbcs r1, r2 on Thumb2. The new code lowers it into adds r0, r0, #1 sbcs r1, r2 which is equivalent, and even a bit more compact. I don't really see what the problem is, either with the old patterns or with the new code. tyomitch: The old patterns lower this code into: ``` movs r3, #0 mvns r3, r3…
		CurDAG->ReplaceAllUsesWith(N, Replacement);
		CurDAG->RemoveDeadNode(N);
		return;
		}
		// Other cases are autogenerated.
		break;

case ARMISD::CMPZ: {		case ARMISD::CMPZ: {
// select (CMPZ X, #-C) -> (CMPZ (ADDS X, #C), #0)		// select (CMPZ X, #-C) -> (CMPZ (ADDS X, #C), #0)
// This allows us to avoid materializing the expensive negative constant.		// This allows us to avoid materializing the expensive negative constant.
// The CMPZ #0 is useless and will be peepholed away but we need to keep it		// The CMPZ #0 is useless and will be peepholed away but we need to keep it
// for its glue output.		// for its glue output.
SDValue X = N->getOperand(0);		SDValue X = N->getOperand(0);
auto *C = dyn_cast<ConstantSDNode>(N->getOperand(1).getNode());		auto *C = dyn_cast<ConstantSDNode>(N->getOperand(1).getNode());
if (C && C->getSExtValue() < 0 && Subtarget->isThumb()) {		if (C && C->getSExtValue() < 0 && Subtarget->isThumb()) {
▲ Show 20 Lines • Show All 1,399 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 820 Lines • ▼ Show 20 Lines	if (Subtarget->isThumb1Only() \|\| !Subtarget->hasV6Ops()
setOperationAction(ISD::MULHS, MVT::i32, Expand);		setOperationAction(ISD::MULHS, MVT::i32, Expand);

setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL, MVT::i64, Custom);		setOperationAction(ISD::SRL, MVT::i64, Custom);
setOperationAction(ISD::SRA, MVT::i64, Custom);		setOperationAction(ISD::SRA, MVT::i64, Custom);

if (!Subtarget->isThumb1Only()) {
// FIXME: We should do this for Thumb1 as well.
setOperationAction(ISD::ADDC, MVT::i32, Custom);		setOperationAction(ISD::ADDC, MVT::i32, Custom);
setOperationAction(ISD::ADDE, MVT::i32, Custom);		setOperationAction(ISD::ADDE, MVT::i32, Custom);
setOperationAction(ISD::SUBC, MVT::i32, Custom);		setOperationAction(ISD::SUBC, MVT::i32, Custom);
setOperationAction(ISD::SUBE, MVT::i32, Custom);		setOperationAction(ISD::SUBE, MVT::i32, Custom);
}

if (!Subtarget->isThumb1Only() && Subtarget->hasV6T2Ops())		if (!Subtarget->isThumb1Only() && Subtarget->hasV6T2Ops())
setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);		setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);

// ARM does not have ROTL.		// ARM does not have ROTL.
setOperationAction(ISD::ROTL, MVT::i32, Expand);		setOperationAction(ISD::ROTL, MVT::i32, Expand);
for (MVT VT : MVT::vector_valuetypes()) {		for (MVT VT : MVT::vector_valuetypes()) {
setOperationAction(ISD::ROTL, VT, Expand);		setOperationAction(ISD::ROTL, VT, Expand);
▲ Show 20 Lines • Show All 8,223 Lines • ▼ Show 20 Lines	if (NewOpc) {

MI.setDesc(*MCID);		MI.setDesc(*MCID);

// Add the optional cc_out operand		// Add the optional cc_out operand
MI.addOperand(MachineOperand::CreateReg(0, /isDef=/true));		MI.addOperand(MachineOperand::CreateReg(0, /isDef=/true));
}		}
unsigned ccOutIdx = MCID->getNumOperands() - 1;		unsigned ccOutIdx = MCID->getNumOperands() - 1;

// Any ARM instruction that sets the 's' bit should specify an optional		// Any ARM instruction that sets the 's' bit should specify an optional
		efriedmaUnsubmitted Not Done Reply Inline Actions Check isThumb1Only() rather than MCID->getSize()? efriedma: Check isThumb1Only() rather than MCID->getSize()?
// "cc_out" operand in the last operand position.		// "cc_out" operand in the last operand position.
if (!MI.hasOptionalDef() \|\| !MCID->OpInfo[ccOutIdx].isOptionalDef()) {		if (!MI.hasOptionalDef() \|\| !MCID->OpInfo[ccOutIdx].isOptionalDef()) {
assert(!NewOpc && "Optional cc_out operand required");		assert(!NewOpc && "Optional cc_out operand required");
return;		return;
}		}
// Look for an implicit def of CPSR added by MachineInstr ctor. Remove it		// Look for an implicit def of CPSR added by MachineInstr ctor. Remove it
// since we already have an optional CPSR def.		// since we already have an optional CPSR def.
bool definesCPSR = false;		bool definesCPSR = false;
Show All 13 Lines	if (!definesCPSR) {
assert(!NewOpc && "Optional cc_out operand required");		assert(!NewOpc && "Optional cc_out operand required");
return;		return;
}		}
assert(deadCPSR == !Node->hasAnyUseOfValue(1) && "inconsistent dead flag");		assert(deadCPSR == !Node->hasAnyUseOfValue(1) && "inconsistent dead flag");
if (deadCPSR) {		if (deadCPSR) {
assert(!MI.getOperand(ccOutIdx).getReg() &&		assert(!MI.getOperand(ccOutIdx).getReg() &&
"expect uninitialized optional cc_out operand");		"expect uninitialized optional cc_out operand");
return;		return;
}		}
		efriedmaUnsubmitted Not Done Reply Inline Actions Check isThumb1Only() rather than MCID->getSize()? efriedma: Check isThumb1Only() rather than MCID->getSize()?

// If this instruction was defined with an optional CPSR def and its dag node		// If this instruction was defined with an optional CPSR def and its dag node
// had a live implicit CPSR def, then activate the optional CPSR def.		// had a live implicit CPSR def, then activate the optional CPSR def.
MachineOperand &MO = MI.getOperand(ccOutIdx);		MachineOperand &MO = MI.getOperand(ccOutIdx);
MO.setReg(ARM::CPSR);		MO.setReg(ARM::CPSR);
MO.setIsDef(true);		MO.setIsDef(true);
}		}

▲ Show 20 Lines • Show All 4,503 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrThumb.td

Show First 20 Lines • Show All 904 Lines • ▼ Show 20 Lines
}		}

let isAdd = 1 in {		let isAdd = 1 in {
// Add with carry register		// Add with carry register
let isCommutable = 1, Uses = [CPSR] in		let isCommutable = 1, Uses = [CPSR] in
def tADC : // A8.6.2		def tADC : // A8.6.2
T1sItDPEncode<0b0101, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm), IIC_iALUr,		T1sItDPEncode<0b0101, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm), IIC_iALUr,
"adc", "\t$Rdn, $Rm",		"adc", "\t$Rdn, $Rm",
[(set tGPR:$Rdn, (adde tGPR:$Rn, tGPR:$Rm))]>, Sched<[WriteALU]>;		[]>, Sched<[WriteALU]>;

// Add immediate		// Add immediate
def tADDi3 : // A8.6.4 T1		def tADDi3 : // A8.6.4 T1
T1sIGenEncodeImm<0b01110, (outs tGPR:$Rd), (ins tGPR:$Rm, imm0_7:$imm3),		T1sIGenEncodeImm<0b01110, (outs tGPR:$Rd), (ins tGPR:$Rm, imm0_7:$imm3),
IIC_iALUi,		IIC_iALUi,
"add", "\t$Rd, $Rm, $imm3",		"add", "\t$Rd, $Rm, $imm3",
[(set tGPR:$Rd, (add tGPR:$Rm, imm0_7:$imm3))]>,		[(set tGPR:$Rd, (add tGPR:$Rm, imm0_7:$imm3))]>,
Sched<[WriteALU]> {		Sched<[WriteALU]> {
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	T1sIDPEncode<0b1001, (outs tGPR:$Rd), (ins tGPR:$Rn),
[(set tGPR:$Rd, (ineg tGPR:$Rn))]>, Sched<[WriteALU]>;		[(set tGPR:$Rd, (ineg tGPR:$Rn))]>, Sched<[WriteALU]>;

// Subtract with carry register		// Subtract with carry register
let Uses = [CPSR] in		let Uses = [CPSR] in
def tSBC : // A8.6.151		def tSBC : // A8.6.151
T1sItDPEncode<0b0110, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm),		T1sItDPEncode<0b0110, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm),
IIC_iALUr,		IIC_iALUr,
"sbc", "\t$Rdn, $Rm",		"sbc", "\t$Rdn, $Rm",
[(set tGPR:$Rdn, (sube tGPR:$Rn, tGPR:$Rm))]>,		[]>,
Sched<[WriteALU]>;		Sched<[WriteALU]>;

// Subtract immediate		// Subtract immediate
def tSUBi3 : // A8.6.210 T1		def tSUBi3 : // A8.6.210 T1
T1sIGenEncodeImm<0b01111, (outs tGPR:$Rd), (ins tGPR:$Rm, imm0_7:$imm3),		T1sIGenEncodeImm<0b01111, (outs tGPR:$Rd), (ins tGPR:$Rm, imm0_7:$imm3),
IIC_iALUi,		IIC_iALUi,
"sub", "\t$Rd, $Rm, $imm3",		"sub", "\t$Rd, $Rm, $imm3",
[(set tGPR:$Rd, (add tGPR:$Rm, imm0_7_neg:$imm3))]>,		[(set tGPR:$Rd, (add tGPR:$Rm, imm0_7_neg:$imm3))]>,
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
//		//

// Comparisons		// Comparisons
def : T1Pat<(ARMcmpZ tGPR:$Rn, imm0_255:$imm8),		def : T1Pat<(ARMcmpZ tGPR:$Rn, imm0_255:$imm8),
(tCMPi8 tGPR:$Rn, imm0_255:$imm8)>;		(tCMPi8 tGPR:$Rn, imm0_255:$imm8)>;
def : T1Pat<(ARMcmpZ tGPR:$Rn, tGPR:$Rm),		def : T1Pat<(ARMcmpZ tGPR:$Rn, tGPR:$Rm),
(tCMPr tGPR:$Rn, tGPR:$Rm)>;		(tCMPr tGPR:$Rn, tGPR:$Rm)>;

// Add with carry
def : T1Pat<(addc tGPR:$lhs, imm0_7:$rhs),
(tADDi3 tGPR:$lhs, imm0_7:$rhs)>;
def : T1Pat<(addc tGPR:$lhs, imm8_255:$rhs),
(tADDi8 tGPR:$lhs, imm8_255:$rhs)>;
def : T1Pat<(addc tGPR:$lhs, tGPR:$rhs),
(tADDrr tGPR:$lhs, tGPR:$rhs)>;

// Subtract with carry
def : T1Pat<(addc tGPR:$lhs, imm0_7_neg:$rhs),
(tSUBi3 tGPR:$lhs, imm0_7_neg:$rhs)>;
def : T1Pat<(addc tGPR:$lhs, imm8_255_neg:$rhs),
(tSUBi8 tGPR:$lhs, imm8_255_neg:$rhs)>;
def : T1Pat<(subc tGPR:$lhs, tGPR:$rhs),
(tSUBrr tGPR:$lhs, tGPR:$rhs)>;

// Bswap 16 with load/store		// Bswap 16 with load/store
def : T1Pat<(srl (bswap (extloadi16 t_addrmode_is2:$addr)), (i32 16)),		def : T1Pat<(srl (bswap (extloadi16 t_addrmode_is2:$addr)), (i32 16)),
(tREV16 (tLDRHi t_addrmode_is2:$addr))>;		(tREV16 (tLDRHi t_addrmode_is2:$addr))>;
def : T1Pat<(srl (bswap (extloadi16 t_addrmode_rr:$addr)), (i32 16)),		def : T1Pat<(srl (bswap (extloadi16 t_addrmode_rr:$addr)), (i32 16)),
(tREV16 (tLDRHr t_addrmode_rr:$addr))>;		(tREV16 (tLDRHr t_addrmode_rr:$addr))>;
def : T1Pat<(truncstorei16 (srl (bswap tGPR:$Rn), (i32 16)),		def : T1Pat<(truncstorei16 (srl (bswap tGPR:$Rn), (i32 16)),
t_addrmode_is2:$addr),		t_addrmode_is2:$addr),
(tSTRHi(tREV16 tGPR:$Rn), t_addrmode_is2:$addr)>;		(tSTRHi(tREV16 tGPR:$Rn), t_addrmode_is2:$addr)>;
▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

test/CodeGen/Thumb/long.ll

	; RUN: llc -mtriple=thumb-eabi %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumb-eabi %s -o - \| FileCheck %s
				; RUN: llc -mtriple=thumbv6-eabi %s -o - \| \
				; RUN: FileCheck %s -check-prefix CHECK -check-prefix CHECK-V6
	; RUN: llc -mtriple=thumb-apple-darwin %s -o - \| \			; RUN: llc -mtriple=thumb-apple-darwin %s -o - \| \
	; RUN: FileCheck %s -check-prefix CHECK -check-prefix CHECK-DARWIN			; RUN: FileCheck %s -check-prefix CHECK -check-prefix CHECK-DARWIN
				efriedmaUnsubmitted Not Done Reply Inline Actions Add -verify-machineinstrs to the RUN lines. efriedma: Add -verify-machineinstrs to the RUN lines.

	define i64 @f1() {			define i64 @f1() {
	entry:			entry:
	ret i64 0			ret i64 0
				; CHECK-LABEL: f1:
				; CHECK: movs r0, #0
				; CHECK-V6: mov r1, r0
	}			}

	define i64 @f2() {			define i64 @f2() {
	entry:			entry:
	ret i64 1			ret i64 1
				; CHECK-LABEL: f2:
				; CHECK: movs r0, #1
				; CHECK: movs r1, #0
	}			}

	define i64 @f3() {			define i64 @f3() {
	entry:			entry:
	ret i64 2147483647			ret i64 2147483647
				; CHECK-LABEL: f3:
				; CHECK: ldr r0,
				; CHECK: movs r1, #0
	}			}

	define i64 @f4() {			define i64 @f4() {
	entry:			entry:
	ret i64 2147483648			ret i64 2147483648
				; CHECK-LABEL: f4:
				; CHECK: movs r0, #1
				; CHECK: lsls r0, r0, #31
				; CHECK: movs r1, #0
	}			}

	define i64 @f5() {			define i64 @f5() {
	entry:			entry:
	ret i64 9223372036854775807			ret i64 9223372036854775807
	; CHECK-LABEL: f5:			; CHECK-LABEL: f5:
	; CHECK: mvn			; CHECK: movs r0, #0
	; CHECK-NOT: mvn			; CHECK: mvns r0, r0
				; CHECK: ldr r1,
	}			}

	define i64 @f6(i64 %x, i64 %y) {			define i64 @f6(i64 %x, i64 %y) {
	entry:			entry:
	%tmp1 = add i64 %y, 1 ; <i64> [#uses=1]			%tmp1 = add i64 %y, 1 ; <i64> [#uses=1]
	ret i64 %tmp1			ret i64 %tmp1
	; CHECK-LABEL: f6:			; CHECK-LABEL: f6:
	; CHECK: adc			; CHECK: adds r0, r2, #1
	; CHECK-NOT: adc			; CHECK: movs r1, #0
				; CHECK: adcs r1, r3
				tyomitchAuthorUnsubmitted Not Done Reply Inline Actions Now I see that lowering an `(ADDE x, y, (ADDC z, t))` into a chain of `(CopyFromReg CPSR, (tADD z, t)), (CopyFromReg CPSR, (tADC x, y, (CopyToReg CPSR)))`, with the CPSR-copying nodes glued to the arithmetic nodes, -- doesn't prevent LLVM from scheduling CPSR-clobbering operations in between the converted ADDC and the converted ADDE, -- such as in this test case, where a flag-setting tMOVi8 is inserted in the middle. An ugly patch is certainly better than an incorrect one, so I decided to go back and finish the "hybrid implementation" using tPseudoInsts with two integer outputs each for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC. tyomitch: Now I see that lowering an `(ADDE x, y, (ADDC z, t))` into a chain of `(CopyFromReg CPSR, (tADD…
				}

				define i64 @f6a(i64 %x, i64 %y) {
				entry:
				%tmp1 = add i64 %y, 10 ; <i64> [#uses=1]
				ret i64 %tmp1
				; CHECK-LABEL: f6a:
				; CHECK: adds r2, #10
				; CHECK: movs r1, #0
				; CHECK: adcs r1, r3
				; CHECK-V6: mov r0, r2
				}

				define i64 @f6b(i64 %x, i64 %y) {
				entry:
				%tmp1 = add i64 %y, 1000 ; <i64> [#uses=1]
				ret i64 %tmp1
				; CHECK-LABEL: f6b:
				; CHECK: movs r0, #125
				; CHECK: lsls r0, r0, #3
				; CHECK: adds r0, r2, r0
				; CHECK: movs r1, #0
				; CHECK: adcs r1, r3
	}			}
				efriedmaUnsubmitted Not Done Reply Inline Actions I'd also like to see some tests here for subtraction with an immediate amount. ("add i64 %y, -10" etc.) efriedma: I'd also like to see some tests here for subtraction with an immediate amount. ("add i64 %y…
				tyomitchAuthorUnsubmitted Not Done Reply Inline Actions Indeed, subtracting immediates wasn't handled well; I'll upload the updated patch. tyomitch: Indeed, subtracting immediates wasn't handled well; I'll upload the updated patch.

	define void @f7() {			define void @f7() {
	entry:			entry:
	%tmp = call i64 @f8( ) ; <i64> [#uses=0]			%tmp = call i64 @f8( ) ; <i64> [#uses=0]
	ret void			ret void
				; CHECK-LABEL: f7:
				; CHECK: bl
	}			}

	declare i64 @f8()			declare i64 @f8()

	define i64 @f9(i64 %a, i64 %b) {			define i64 @f9(i64 %a, i64 %b) {
	entry:			entry:
	%tmp = sub i64 %a, %b ; <i64> [#uses=1]			%tmp = sub i64 %a, %b ; <i64> [#uses=1]
	ret i64 %tmp			ret i64 %tmp
	; CHECK-LABEL: f9:			; CHECK-LABEL: f9:
	; CHECK: sbc			; CHECK: subs r0, r0, r2
	; CHECK-NOT: sbc			; CHECK: sbcs r1, r3
	}			}

	define i64 @f(i32 %a, i32 %b) {			define i64 @f(i32 %a, i32 %b) {
	entry:			entry:
	%tmp = sext i32 %a to i64 ; <i64> [#uses=1]			%tmp = sext i32 %a to i64 ; <i64> [#uses=1]
	%tmp1 = sext i32 %b to i64 ; <i64> [#uses=1]			%tmp1 = sext i32 %b to i64 ; <i64> [#uses=1]
	%tmp2 = mul i64 %tmp1, %tmp ; <i64> [#uses=1]			%tmp2 = mul i64 %tmp1, %tmp ; <i64> [#uses=1]
	ret i64 %tmp2			ret i64 %tmp2
	; CHECK-LABEL: f:			; CHECK-LABEL: f:
				; CHECK-V6: bl __aeabi_lmul
	; CHECK-DARWIN: __muldi3			; CHECK-DARWIN: __muldi3
	}			}

	define i64 @g(i32 %a, i32 %b) {			define i64 @g(i32 %a, i32 %b) {
	entry:			entry:
	%tmp = zext i32 %a to i64 ; <i64> [#uses=1]			%tmp = zext i32 %a to i64 ; <i64> [#uses=1]
	%tmp1 = zext i32 %b to i64 ; <i64> [#uses=1]			%tmp1 = zext i32 %b to i64 ; <i64> [#uses=1]
	%tmp2 = mul i64 %tmp1, %tmp ; <i64> [#uses=1]			%tmp2 = mul i64 %tmp1, %tmp ; <i64> [#uses=1]
	ret i64 %tmp2			ret i64 %tmp2
	; CHECK-LABEL: g:			; CHECK-LABEL: g:
				; CHECK-V6: bl __aeabi_lmul
	; CHECK-DARWIN: __muldi3			; CHECK-DARWIN: __muldi3
	}			}

	define i64 @f10() {			define i64 @f10() {
	entry:			entry:
	%a = alloca i64, align 8 ; <i64*> [#uses=1]			%a = alloca i64, align 8 ; <i64*> [#uses=1]
	%retval = load i64, i64* %a ; <i64> [#uses=1]			%retval = load i64, i64* %a ; <i64> [#uses=1]
	ret i64 %retval			ret i64 %retval
				; CHECK-LABEL: f10:
				; CHECK: sub sp, #8
				; CHECK: ldr r0, [sp]
				; CHECK: ldr r1, [sp, #4]
				; CHECK: add sp, #8
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as already done for ARM and Thumb2.ClosedPublic

Details

Diff Detail

Event Timeline

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

Revision Contents

Diff 90031

lib/Target/ARM/ARMISelDAGToDAG.cpp

lib/Target/ARM/ARMISelLowering.cpp

lib/Target/ARM/ARMInstrThumb.td

test/CodeGen/Thumb/long.ll

For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as already done for ARM and Thumb2.
ClosedPublic