This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AVR/
-
Target/
-
AVR/
-
AVRISelLowering.h
9/24
AVRISelLowering.cpp
-
AVRInstrInfo.td
-
test/CodeGen/AVR/
-
CodeGen/
-
AVR/
-
shift32.ll

Differential D140569

[AVR] Custom lower 32-bit shift instructions
ClosedPublic

Authored by aykevl on Dec 22 2022, 11:13 AM.

Download Raw Diff

Details

Reviewers

dylanmckay
benshi001

Commits

rG840d10a1d2c9: [AVR] Custom lower 32-bit shift instructions

Summary

32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.

I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.

This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.

The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aykevl created this revision.Dec 22 2022, 11:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 11:13 AM

Herald added subscribers: Jim, hiraditya. · View Herald Transcript

aykevl requested review of this revision.Dec 22 2022, 11:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 11:13 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

(testing a no-op update with arcanist)

Harbormaster completed remote builds in B204628: Diff 484903.Dec 22 2022, 11:15 AM

aykevl added a child revision: D140570: [AVR] Optimize 32-bit shift: move bytes around.Dec 22 2022, 11:16 AM

aykevl mentioned this in D138529: [AVR] Optimize constant 32-bit shifts.Dec 22 2022, 11:32 AM

(arcanist used the last commit instead of the intended commit, restoring the diff again)

Harbormaster completed remote builds in B204637: Diff 484913.Dec 22 2022, 12:31 PM

benshi001 added inline comments.Dec 23 2022, 1:19 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	`"Expected a constant shift amount!"`
1864	Would it be better to be for (ssize_t i = Regs.size() - 1; i >= 0; i--) { ... if (i == Regs.size() - 1) { ... } else { ... } }

LGTM. Thanks!

This revision is now accepted and ready to land.Dec 23 2022, 1:22 AM

Thanks, I'll fix these things before committing.

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	Fixed locally.
1864	Fixed locally.

arsenm added a subscriber: arsenm.Dec 25 2022, 7:45 AM

arsenm added inline comments.

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	You can't guarantee this condition. If you don't want to handle this properly, should use emitError, or at least report_fatal_error and not unreachable. This could easily be run into by users
318	Why can't you handle everything here as a normal custom lowering? Why do you need the custom inserter?
1858	const ref
1925–1929	Can directly initialize (also could use std::array?)

benshi001 added inline comments.Dec 25 2022, 7:19 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	Yes. This is problemetic. I guess the Ayke's intention is to let non-const shift amounts fall into the default ordinary process.

benshi001 added inline comments.Dec 25 2022, 7:51 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1864	I saw you use captical loop variable `I` in your other patch, https://reviews.llvm.org/D140570, maybe it would be better to also use `I` instead of `i` here.

benshi001 requested changes to this revision.Dec 25 2022, 8:02 PM

This revision now requires changes to proceed.Dec 25 2022, 8:02 PM

aykevl added inline comments.Dec 26 2022, 8:33 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	You can't guarantee this condition. If you don't want to handle this properly, should use emitError, or at least report_fatal_error and not unreachable. This could easily be run into by users How so? Non-constant shifts are expanded by an IR pass to constant shifts (in a loop), see llvm/lib/Target/AVR/AVRShiftExpand.cpp. Maybe there's a way to circumvent it but with some testing I couldn't come up with IR that hits this assert. But I'll change it to `report_fatal_error` in an update. (I also have a local patch that removes the IR pass and instead handles non-constant shifts here).
318	I'm not sure what you mean? How can this be done in a different way? If you mean building the entire thing in SelectionDAG, I think that would be a lot more complicated if not impossible because eventually I want to generate loops. I intend to introduce a later patch that lowers these shits to a loop if emitting the full sequence results in more instructions and `minsize` is set (avr-gcc does the same).
1858	fixed
1864	That was already fixed locally :)
1925–1929	Good idea, fixed.

arsenm added inline comments.Dec 26 2022, 8:36 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	There's no verifier for this, so you still need to be more diligent about error handling. A DAG combine could still choose to introduce a non-constant shift amount, or any IR pass after the point you lowered it
318	If you need to emit control flow, you don't have much choice besides the custom inserter

apply review comments

aykevl added inline comments.Dec 26 2022, 8:45 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	Okay. It's a `report_fatal_error` now so it won't silently miscompile here anymore. If it ever happens, it must be very rare because before this patch it would have resulted in a linker error (`__lshlsi3` etc) instead of a compiler error.
318	Ok, understood. Next time I do an optimization that doesn't require flow control I'll try doing this in `XXXDAGToDAGISel::trySelect` (I assume that's what you mean?)

Harbormaster completed remote builds in B204926: Diff 485309.Dec 26 2022, 9:54 AM

Update to use opcodes instead of ShiftAmt and ArithmeticShift.

@benshi001 I updated the patch following your comment here: https://reviews.llvm.org/D138529#inline-1345614. While working on lowering the shift as a loop I found opcodes to be easier to work with. If you agree with this change, I will update all other patches as well.

Harbormaster completed remote builds in B204930: Diff 485313.Dec 26 2022, 10:28 AM

benshi001 added a comment.Dec 26 2022, 7:32 PM

This comment was removed by benshi001.

llvm/lib/Target/AVR/AVRISelLowering.cpp
1858	Can it be `const DebugLoc &dl = MI.getDebugLoc();` ? In which `dl` is a reference than a local variable ?

In D140569#4016932, @aykevl wrote:

Update to use opcodes instead of ShiftAmt and ArithmeticShift.

@benshi001 I updated the patch following your comment here: https://reviews.llvm.org/D138529#inline-1345614. While working on lowering the shift as a loop I found opcodes to be easier to work with. If you agree with this change, I will update all other patches as well.

Sure. Please go ahead. I do prefer the solution of exposing opcode.

change const DebugLoc dl to const DebugLoc &dl

Harbormaster completed remote builds in B205243: Diff 485730.Dec 30 2022, 5:14 PM

Beside my latest two pieces of inline comment, There is failure of LLVM.CodeGen/AVR::shift.ll in the pre-merge checks on both windows and linux, I think you need to have a check.

llvm/lib/Target/AVR/AVRISelLowering.cpp
1860	can it be `const bool ShiftLeft = Opc == ISD::SHL;` ?
1861	`const bool ArithmeticShift = Opc == ISD::SRA;`
1886	This loop variable need to be `I`, like the above loop.

aykevl mentioned this in D140777: [AVR] Fix some ambiguous cases in AsmParser.Jan 1 2023, 11:00 AM

Looks like 3bb5ddd1756d4573d3104f8b86d2973dbc550402 broke the pre-merge check. I'll rebase the patch.

Special-case logical shifts of 16 bits. This fixes a number of issues: it avoids unnecessary code changes in this PR, it fixes an issue after rebasing on the main branch (as seen in the buildbot failure), and it fixes an issue I found while working on D140822.
Apply review feedback.

This is ready for review again.

aykevl mentioned this in D140570: [AVR] Optimize 32-bit shift: move bytes around.Jan 1 2023, 1:27 PM

run clang-format

Harbormaster completed remote builds in B205318: Diff 485818.Jan 1 2023, 2:11 PM

benshi001 added inline comments.Jan 1 2023, 6:15 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
303	Is this necessary? Can it be covered by https://reviews.llvm.org/D140570 ? If you think it is better to handle `ShiftAmount == 16` at here, I suggest you add tests for that in current patch.

In D140569#4021340, @aykevl wrote:

Special-case logical shifts of 16 bits. This fixes a number of issues: it avoids unnecessary code changes in this PR, it fixes an issue after rebasing on the main branch (as seen in the buildbot failure), and it fixes an issue I found while working on D140822.

Apply review feedback.

This is ready for review again.

I see your idea, it is great. My only concern is that it would be better to add tests for the 16-bit shifts in current patch. You can do that while committing.

benshi001 accepted this revision.Jan 1 2023, 6:58 PM

This revision is now accepted and ready to land.Jan 1 2023, 6:58 PM

Closed by commit rG840d10a1d2c9: [AVR] Custom lower 32-bit shift instructions (authored by aykevl). · Explain WhyJan 8 2023, 11:06 AM

This revision was automatically updated to reflect the committed changes.

aykevl added a commit: rG840d10a1d2c9: [AVR] Custom lower 32-bit shift instructions.

Revision Contents

Path

Size

llvm/

lib/

Target/

AVR/

AVRISelLowering.h

5 lines

AVRISelLowering.cpp

169 lines

AVRInstrInfo.td

18 lines

test/

CodeGen/

AVR/

shift32.ll

129 lines

Diff 487207

llvm/lib/Target/AVR/AVRISelLowering.h

Show All 33 Lines	enum NodeType {
CALL,		CALL,
/// A wrapper node for TargetConstantPool,		/// A wrapper node for TargetConstantPool,
/// TargetExternalSymbol, and TargetGlobalAddress.		/// TargetExternalSymbol, and TargetGlobalAddress.
WRAPPER,		WRAPPER,
LSL, ///< Logical shift left.		LSL, ///< Logical shift left.
LSLBN, ///< Byte logical shift left N bits.		LSLBN, ///< Byte logical shift left N bits.
LSLWN, ///< Word logical shift left N bits.		LSLWN, ///< Word logical shift left N bits.
LSLHI, ///< Higher 8-bit of word logical shift left.		LSLHI, ///< Higher 8-bit of word logical shift left.
		LSLW, ///< Wide logical shift left.
LSR, ///< Logical shift right.		LSR, ///< Logical shift right.
LSRBN, ///< Byte logical shift right N bits.		LSRBN, ///< Byte logical shift right N bits.
LSRWN, ///< Word logical shift right N bits.		LSRWN, ///< Word logical shift right N bits.
LSRLO, ///< Lower 8-bit of word logical shift right.		LSRLO, ///< Lower 8-bit of word logical shift right.
		LSRW, ///< Wide logical shift right.
ASR, ///< Arithmetic shift right.		ASR, ///< Arithmetic shift right.
ASRBN, ///< Byte arithmetic shift right N bits.		ASRBN, ///< Byte arithmetic shift right N bits.
ASRWN, ///< Word arithmetic shift right N bits.		ASRWN, ///< Word arithmetic shift right N bits.
ASRLO, ///< Lower 8-bit of word arithmetic shift right.		ASRLO, ///< Lower 8-bit of word arithmetic shift right.
		ASRW, ///< Wide arithmetic shift right.
ROR, ///< Bit rotate right.		ROR, ///< Bit rotate right.
ROL, ///< Bit rotate left.		ROL, ///< Bit rotate left.
LSLLOOP, ///< A loop of single logical shift left instructions.		LSLLOOP, ///< A loop of single logical shift left instructions.
LSRLOOP, ///< A loop of single logical shift right instructions.		LSRLOOP, ///< A loop of single logical shift right instructions.
ROLLOOP, ///< A loop of single left bit rotate instructions.		ROLLOOP, ///< A loop of single left bit rotate instructions.
RORLOOP, ///< A loop of single right bit rotate instructions.		RORLOOP, ///< A loop of single right bit rotate instructions.
ASRLOOP, ///< A loop of single arithmetic shift right instructions.		ASRLOOP, ///< A loop of single arithmetic shift right instructions.
/// AVR conditional branches. Operand 0 is the chain operand, operand 1		/// AVR conditional branches. Operand 0 is the chain operand, operand 1
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const;		SmallVectorImpl<SDValue> &InVals) const;

protected:		protected:
const AVRSubtarget &Subtarget;		const AVRSubtarget &Subtarget;

private:		private:
MachineBasicBlock insertShift(MachineInstr &MI, MachineBasicBlock BB) const;		MachineBasicBlock insertShift(MachineInstr &MI, MachineBasicBlock BB) const;
		MachineBasicBlock *insertWideShift(MachineInstr &MI,
		MachineBasicBlock *BB) const;
MachineBasicBlock insertMul(MachineInstr &MI, MachineBasicBlock BB) const;		MachineBasicBlock insertMul(MachineInstr &MI, MachineBasicBlock BB) const;
MachineBasicBlock *insertCopyZero(MachineInstr &MI,		MachineBasicBlock *insertCopyZero(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
MachineBasicBlock *insertAtomicArithmeticOp(MachineInstr &MI,		MachineBasicBlock *insertAtomicArithmeticOp(MachineInstr &MI,
MachineBasicBlock *BB,		MachineBasicBlock *BB,
unsigned Opcode, int Width) const;		unsigned Opcode, int Width) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_AVR_ISEL_LOWERING_H		#endif // LLVM_AVR_ISEL_LOWERING_H

llvm/lib/Target/AVR/AVRISelLowering.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	AVRTargetLowering::AVRTargetLowering(const AVRTargetMachine &TM,
// our shift instructions are only able to shift 1 bit at a time, so handle		// our shift instructions are only able to shift 1 bit at a time, so handle
// this in a custom way.		// this in a custom way.
setOperationAction(ISD::SRA, MVT::i8, Custom);		setOperationAction(ISD::SRA, MVT::i8, Custom);
setOperationAction(ISD::SHL, MVT::i8, Custom);		setOperationAction(ISD::SHL, MVT::i8, Custom);
setOperationAction(ISD::SRL, MVT::i8, Custom);		setOperationAction(ISD::SRL, MVT::i8, Custom);
setOperationAction(ISD::SRA, MVT::i16, Custom);		setOperationAction(ISD::SRA, MVT::i16, Custom);
setOperationAction(ISD::SHL, MVT::i16, Custom);		setOperationAction(ISD::SHL, MVT::i16, Custom);
setOperationAction(ISD::SRL, MVT::i16, Custom);		setOperationAction(ISD::SRL, MVT::i16, Custom);
		setOperationAction(ISD::SRA, MVT::i32, Custom);
		setOperationAction(ISD::SHL, MVT::i32, Custom);
		setOperationAction(ISD::SRL, MVT::i32, Custom);
setOperationAction(ISD::SHL_PARTS, MVT::i16, Expand);		setOperationAction(ISD::SHL_PARTS, MVT::i16, Expand);
setOperationAction(ISD::SRA_PARTS, MVT::i16, Expand);		setOperationAction(ISD::SRA_PARTS, MVT::i16, Expand);
setOperationAction(ISD::SRL_PARTS, MVT::i16, Expand);		setOperationAction(ISD::SRL_PARTS, MVT::i16, Expand);

setOperationAction(ISD::ROTL, MVT::i8, Custom);		setOperationAction(ISD::ROTL, MVT::i8, Custom);
setOperationAction(ISD::ROTL, MVT::i16, Expand);		setOperationAction(ISD::ROTL, MVT::i16, Expand);
setOperationAction(ISD::ROTR, MVT::i8, Custom);		setOperationAction(ISD::ROTR, MVT::i8, Custom);
setOperationAction(ISD::ROTR, MVT::i16, Expand);		setOperationAction(ISD::ROTR, MVT::i16, Expand);
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	#define NODE(name) \
switch (Opcode) {		switch (Opcode) {
default:		default:
return nullptr;		return nullptr;
NODE(RET_FLAG);		NODE(RET_FLAG);
NODE(RETI_FLAG);		NODE(RETI_FLAG);
NODE(CALL);		NODE(CALL);
NODE(WRAPPER);		NODE(WRAPPER);
NODE(LSL);		NODE(LSL);
		NODE(LSLW);
NODE(LSR);		NODE(LSR);
		NODE(LSRW);
NODE(ROL);		NODE(ROL);
NODE(ROR);		NODE(ROR);
NODE(ASR);		NODE(ASR);
		NODE(ASRW);
NODE(LSLLOOP);		NODE(LSLLOOP);
NODE(LSRLOOP);		NODE(LSRLOOP);
NODE(ROLLOOP);		NODE(ROLLOOP);
NODE(RORLOOP);		NODE(RORLOOP);
NODE(ASRLOOP);		NODE(ASRLOOP);
NODE(BRCOND);		NODE(BRCOND);
NODE(CMP);		NODE(CMP);
NODE(CMPC);		NODE(CMPC);
Show All 12 Lines
SDValue AVRTargetLowering::LowerShifts(SDValue Op, SelectionDAG &DAG) const {		SDValue AVRTargetLowering::LowerShifts(SDValue Op, SelectionDAG &DAG) const {
unsigned Opc8;		unsigned Opc8;
const SDNode *N = Op.getNode();		const SDNode *N = Op.getNode();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc dl(N);		SDLoc dl(N);
assert(isPowerOf2_32(VT.getSizeInBits()) &&		assert(isPowerOf2_32(VT.getSizeInBits()) &&
"Expected power-of-2 shift amount");		"Expected power-of-2 shift amount");

		if (VT.getSizeInBits() == 32) {
		if (!isa<ConstantSDNode>(N->getOperand(1))) {
		// 32-bit shifts are converted to a loop in IR.
		// This should be unreachable.
		benshi001Unsubmitted Not Done Reply Inline Actions `"Expected a constant shift amount!"` benshi001: `"Expected a constant shift amount!"`
		aykevlAuthorUnsubmitted Done Reply Inline Actions Fixed locally. aykevl: Fixed locally.
		arsenmUnsubmitted Not Done Reply Inline Actions You can't guarantee this condition. If you don't want to handle this properly, should use emitError, or at least report_fatal_error and not unreachable. This could easily be run into by users arsenm: You can't guarantee this condition. If you don't want to handle this properly, should use…
		benshi001Unsubmitted Not Done Reply Inline Actions Yes. This is problemetic. I guess the Ayke's intention is to let non-const shift amounts fall into the default ordinary process. benshi001: Yes. This is problemetic. I guess the Ayke's intention is to let non-const shift amounts fall…
		aykevlAuthorUnsubmitted Done Reply Inline Actions You can't guarantee this condition. If you don't want to handle this properly, should use emitError, or at least report_fatal_error and not unreachable. This could easily be run into by users How so? Non-constant shifts are expanded by an IR pass to constant shifts (in a loop), see llvm/lib/Target/AVR/AVRShiftExpand.cpp. Maybe there's a way to circumvent it but with some testing I couldn't come up with IR that hits this assert. But I'll change it to `report_fatal_error` in an update. (I also have a local patch that removes the IR pass and instead handles non-constant shifts here). aykevl: > You can't guarantee this condition. If you don't want to handle this properly, should use…
		arsenmUnsubmitted Not Done Reply Inline Actions There's no verifier for this, so you still need to be more diligent about error handling. A DAG combine could still choose to introduce a non-constant shift amount, or any IR pass after the point you lowered it arsenm: There's no verifier for this, so you still need to be more diligent about error handling. A DAG…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Okay. It's a `report_fatal_error` now so it won't silently miscompile here anymore. If it ever happens, it must be very rare because before this patch it would have resulted in a linker error (`__lshlsi3` etc) instead of a compiler error. aykevl: Okay. It's a `report_fatal_error` now so it won't silently miscompile here anymore. If it ever…
		report_fatal_error("Expected a constant shift amount!");
		}
		SDVTList ResTys = DAG.getVTList(MVT::i16, MVT::i16);
		SDValue SrcLo =
		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
		DAG.getConstant(0, dl, MVT::i16));
		SDValue SrcHi =
		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
		DAG.getConstant(1, dl, MVT::i16));
		uint64_t ShiftAmount =
		cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
		if (ShiftAmount == 16) {
		benshi001Unsubmitted Not Done Reply Inline Actions Is this necessary? Can it be covered by https://reviews.llvm.org/D140570 ? If you think it is better to handle `ShiftAmount == 16` at here, I suggest you add tests for that in current patch. benshi001: Is this necessary? Can it be covered by https://reviews.llvm.org/D140570 ? If you think it is…
		// Special case these two operations because they appear to be used by the
		// generic codegen parts to lower 32-bit numbers.
		// TODO: perhaps we can lower shift amounts bigger than 16 to a 16-bit
		// shift of a part of the 32-bit value?
		switch (Op.getOpcode()) {
		case ISD::SHL: {
		SDValue Zero = DAG.getConstant(0, dl, MVT::i16);
		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, Zero, SrcLo);
		}
		case ISD::SRL: {
		SDValue Zero = DAG.getConstant(0, dl, MVT::i16);
		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, SrcHi, Zero);
		}
		}
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Why can't you handle everything here as a normal custom lowering? Why do you need the custom inserter? arsenm: Why can't you handle everything here as a normal custom lowering? Why do you need the custom…
		aykevlAuthorUnsubmitted Done Reply Inline Actions I'm not sure what you mean? How can this be done in a different way? If you mean building the entire thing in SelectionDAG, I think that would be a lot more complicated if not impossible because eventually I want to generate loops. I intend to introduce a later patch that lowers these shits to a loop if emitting the full sequence results in more instructions and `minsize` is set (avr-gcc does the same). aykevl: I'm not sure what you mean? How can this be done in a different way? If you mean building the…
		arsenmUnsubmitted Not Done Reply Inline Actions If you need to emit control flow, you don't have much choice besides the custom inserter arsenm: If you need to emit control flow, you don't have much choice besides the custom inserter
		aykevlAuthorUnsubmitted Done Reply Inline Actions Ok, understood. Next time I do an optimization that doesn't require flow control I'll try doing this in `XXXDAGToDAGISel::trySelect` (I assume that's what you mean?) aykevl: Ok, understood. Next time I do an optimization that doesn't require flow control I'll try doing…
		SDValue Cnt = DAG.getTargetConstant(ShiftAmount, dl, MVT::i8);
		unsigned Opc;
		switch (Op.getOpcode()) {
		default:
		llvm_unreachable("Invalid 32-bit shift opcode!");
		case ISD::SHL:
		Opc = AVRISD::LSLW;
		break;
		case ISD::SRL:
		Opc = AVRISD::LSRW;
		break;
		case ISD::SRA:
		Opc = AVRISD::ASRW;
		break;
		}
		SDValue Result = DAG.getNode(Opc, dl, ResTys, SrcLo, SrcHi, Cnt);
		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, Result.getValue(0),
		Result.getValue(1));
		}

// Expand non-constant shifts to loops.		// Expand non-constant shifts to loops.
if (!isa<ConstantSDNode>(N->getOperand(1))) {		if (!isa<ConstantSDNode>(N->getOperand(1))) {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
llvm_unreachable("Invalid shift opcode!");		llvm_unreachable("Invalid shift opcode!");
case ISD::SHL:		case ISD::SHL:
return DAG.getNode(AVRISD::LSLLOOP, dl, VT, N->getOperand(0),		return DAG.getNode(AVRISD::LSLLOOP, dl, VT, N->getOperand(0),
N->getOperand(1));		N->getOperand(1));
▲ Show 20 Lines • Show All 1,494 Lines • ▼ Show 20 Lines	MachineBasicBlock *AVRTargetLowering::insertShift(MachineInstr &MI,

BuildMI(CheckBB, dl, TII.get(AVR::DECRd), ShiftAmtReg2).addReg(ShiftAmtReg);		BuildMI(CheckBB, dl, TII.get(AVR::DECRd), ShiftAmtReg2).addReg(ShiftAmtReg);
BuildMI(CheckBB, dl, TII.get(AVR::BRPLk)).addMBB(LoopBB);		BuildMI(CheckBB, dl, TII.get(AVR::BRPLk)).addMBB(LoopBB);

MI.eraseFromParent(); // The pseudo instruction is gone now.		MI.eraseFromParent(); // The pseudo instruction is gone now.
return RemBB;		return RemBB;
}		}

		// Do a multibyte AVR shift. Insert shift instructions and put the output
		// registers in the Regs array.
		// Because AVR does not have a normal shift instruction (only a single bit shift
		// instruction), we have to emulate this behavior with other instructions.
		static void insertMultibyteShift(MachineInstr &MI, MachineBasicBlock *BB,
		MutableArrayRef<std::pair<Register, int>> Regs,
		ISD::NodeType Opc, int64_t ShiftAmt) {
		const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
		const DebugLoc &dl = MI.getDebugLoc();
		arsenmUnsubmitted Not Done Reply Inline Actions const ref arsenm: const ref
		aykevlAuthorUnsubmitted Done Reply Inline Actions fixed aykevl: fixed
		benshi001Unsubmitted Not Done Reply Inline Actions Can it be `const DebugLoc &dl = MI.getDebugLoc();` ? In which `dl` is a reference than a local variable ? benshi001: Can it be `const DebugLoc &dl = MI.getDebugLoc();` ? In which `dl` is a reference than a local…

		const bool ShiftLeft = Opc == ISD::SHL;
		benshi001Unsubmitted Not Done Reply Inline Actions can it be `const bool ShiftLeft = Opc == ISD::SHL;` ? benshi001: can it be `const bool ShiftLeft = Opc == ISD::SHL;` ?
		const bool ArithmeticShift = Opc == ISD::SRA;
		benshi001Unsubmitted Not Done Reply Inline Actions `const bool ArithmeticShift = Opc == ISD::SRA;` benshi001: ` const bool ArithmeticShift = Opc == ISD::SRA;`

		// Shift by one. This is the fallback that always works, and the shift
		// operation that is used for 1, 2, and 3 bit shifts.
		benshi001Unsubmitted Not Done Reply Inline Actions Would it be better to be for (ssize_t i = Regs.size() - 1; i >= 0; i--) { ... if (i == Regs.size() - 1) { ... } else { ... } } benshi001: Would it be better to be ``` for (ssize_t i = Regs.size() - 1; i >= 0; i--) { ... if (i ==…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Fixed locally. aykevl: Fixed locally.
		benshi001Unsubmitted Not Done Reply Inline Actions I saw you use captical loop variable `I` in your other patch, https://reviews.llvm.org/D140570, maybe it would be better to also use `I` instead of `i` here. benshi001: I saw you use captical loop variable `I` in your other patch, https://reviews.llvm.org/D140570…
		aykevlAuthorUnsubmitted Done Reply Inline Actions That was already fixed locally :) aykevl: That was already fixed locally :)
		while (ShiftLeft && ShiftAmt) {
		// Shift one to the left.
		for (ssize_t I = Regs.size() - 1; I >= 0; I--) {
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register In = Regs[I].first;
		Register InSubreg = Regs[I].second;
		if (I == (ssize_t)Regs.size() - 1) { // first iteration
		BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Out)
		.addReg(In, 0, InSubreg)
		.addReg(In, 0, InSubreg);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), Out)
		.addReg(In, 0, InSubreg)
		.addReg(In, 0, InSubreg);
		}
		Regs[I] = std::pair(Out, 0);
		}
		ShiftAmt--;
		}
		while (!ShiftLeft && ShiftAmt) {
		// Shift one to the right.
		for (size_t I = 0; I < Regs.size(); I++) {
		benshi001Unsubmitted Not Done Reply Inline Actions This loop variable need to be `I`, like the above loop. benshi001: This loop variable need to be `I`, like the above loop.
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register In = Regs[I].first;
		Register InSubreg = Regs[I].second;
		if (I == 0) {
		unsigned Opc = ArithmeticShift ? AVR::ASRRd : AVR::LSRRd;
		BuildMI(*BB, MI, dl, TII.get(Opc), Out).addReg(In, 0, InSubreg);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), Out).addReg(In, 0, InSubreg);
		}
		Regs[I] = std::pair(Out, 0);
		}
		ShiftAmt--;
		}

		if (ShiftAmt != 0) {
		llvm_unreachable("don't know how to shift!"); // sanity check
		}
		}

		// Do a wide (32-bit) shift.
		MachineBasicBlock *
		AVRTargetLowering::insertWideShift(MachineInstr &MI,
		MachineBasicBlock *BB) const {
		const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
		const DebugLoc &dl = MI.getDebugLoc();

		// How much to shift to the right (meaning: a negative number indicates a left
		// shift).
		int64_t ShiftAmt = MI.getOperand(4).getImm();
		ISD::NodeType Opc;
		switch (MI.getOpcode()) {
		case AVR::Lsl32:
		Opc = ISD::SHL;
		break;
		case AVR::Lsr32:
		Opc = ISD::SRL;
		break;
		case AVR::Asr32:
		Opc = ISD::SRA;
		break;
		}

		// Read the input registers, with the most significant register at index 0.
		arsenmUnsubmitted Not Done Reply Inline Actions Can directly initialize (also could use std::array?) arsenm: Can directly initialize (also could use std::array?)
		aykevlAuthorUnsubmitted Done Reply Inline Actions Good idea, fixed. aykevl: Good idea, fixed.
		std::array<std::pair<Register, int>, 4> Registers = {
		std::pair(MI.getOperand(3).getReg(), AVR::sub_hi),
		std::pair(MI.getOperand(3).getReg(), AVR::sub_lo),
		std::pair(MI.getOperand(2).getReg(), AVR::sub_hi),
		std::pair(MI.getOperand(2).getReg(), AVR::sub_lo),
		};

		// Do the shift. The registers are modified in-place.
		insertMultibyteShift(MI, BB, Registers, Opc, ShiftAmt);

		// Combine the 8-bit registers into 16-bit register pairs.
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())
		.addReg(Registers[0].first, 0, Registers[0].second)
		.addImm(AVR::sub_hi)
		.addReg(Registers[1].first, 0, Registers[1].second)
		.addImm(AVR::sub_lo);
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())
		.addReg(Registers[2].first, 0, Registers[2].second)
		.addImm(AVR::sub_hi)
		.addReg(Registers[3].first, 0, Registers[3].second)
		.addImm(AVR::sub_lo);

		// Remove the pseudo instruction.
		MI.eraseFromParent();
		return BB;
		}

static bool isCopyMulResult(MachineBasicBlock::iterator const &I) {		static bool isCopyMulResult(MachineBasicBlock::iterator const &I) {
if (I->getOpcode() == AVR::COPY) {		if (I->getOpcode() == AVR::COPY) {
Register SrcReg = I->getOperand(1).getReg();		Register SrcReg = I->getOperand(1).getReg();
return (SrcReg == AVR::R0 \|\| SrcReg == AVR::R1);		return (SrcReg == AVR::R0 \|\| SrcReg == AVR::R1);
}		}

return false;		return false;
}		}
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	AVRTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
case AVR::Lsr16:		case AVR::Lsr16:
case AVR::Rol8:		case AVR::Rol8:
case AVR::Rol16:		case AVR::Rol16:
case AVR::Ror8:		case AVR::Ror8:
case AVR::Ror16:		case AVR::Ror16:
case AVR::Asr8:		case AVR::Asr8:
case AVR::Asr16:		case AVR::Asr16:
return insertShift(MI, MBB);		return insertShift(MI, MBB);
		case AVR::Lsl32:
		case AVR::Lsr32:
		case AVR::Asr32:
		return insertWideShift(MI, MBB);
case AVR::MULRdRr:		case AVR::MULRdRr:
case AVR::MULSRdRr:		case AVR::MULSRdRr:
return insertMul(MI, MBB);		return insertMul(MI, MBB);
case AVR::CopyZero:		case AVR::CopyZero:
return insertCopyZero(MI, MBB);		return insertCopyZero(MI, MBB);
case AVR::AtomicLoadAdd8:		case AVR::AtomicLoadAdd8:
return insertAtomicArithmeticOp(MI, MBB, AVR::ADDRdRr, 8);		return insertAtomicArithmeticOp(MI, MBB, AVR::ADDRdRr, 8);
case AVR::AtomicLoadAdd16:		case AVR::AtomicLoadAdd16:
▲ Show 20 Lines • Show All 463 Lines • Show Last 20 Lines

llvm/lib/Target/AVR/AVRInstrInfo.td

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
def AVRlsrlo : SDNode<"AVRISD::LSRLO", SDTIntUnaryOp>;		def AVRlsrlo : SDNode<"AVRISD::LSRLO", SDTIntUnaryOp>;
def AVRasrlo : SDNode<"AVRISD::ASRLO", SDTIntUnaryOp>;		def AVRasrlo : SDNode<"AVRISD::ASRLO", SDTIntUnaryOp>;
def AVRlslbn : SDNode<"AVRISD::LSLBN", SDTIntBinOp>;		def AVRlslbn : SDNode<"AVRISD::LSLBN", SDTIntBinOp>;
def AVRlsrbn : SDNode<"AVRISD::LSRBN", SDTIntBinOp>;		def AVRlsrbn : SDNode<"AVRISD::LSRBN", SDTIntBinOp>;
def AVRasrbn : SDNode<"AVRISD::ASRBN", SDTIntBinOp>;		def AVRasrbn : SDNode<"AVRISD::ASRBN", SDTIntBinOp>;
def AVRlslwn : SDNode<"AVRISD::LSLWN", SDTIntBinOp>;		def AVRlslwn : SDNode<"AVRISD::LSLWN", SDTIntBinOp>;
def AVRlsrwn : SDNode<"AVRISD::LSRWN", SDTIntBinOp>;		def AVRlsrwn : SDNode<"AVRISD::LSRWN", SDTIntBinOp>;
def AVRasrwn : SDNode<"AVRISD::ASRWN", SDTIntBinOp>;		def AVRasrwn : SDNode<"AVRISD::ASRWN", SDTIntBinOp>;
		def AVRlslw : SDNode<"AVRISD::LSLW", SDTIntShiftDOp>;
		def AVRlsrw : SDNode<"AVRISD::LSRW", SDTIntShiftDOp>;
		def AVRasrw : SDNode<"AVRISD::ASRW", SDTIntShiftDOp>;

// Pseudo shift nodes for non-constant shift amounts.		// Pseudo shift nodes for non-constant shift amounts.
def AVRlslLoop : SDNode<"AVRISD::LSLLOOP", SDTIntShiftOp>;		def AVRlslLoop : SDNode<"AVRISD::LSLLOOP", SDTIntShiftOp>;
def AVRlsrLoop : SDNode<"AVRISD::LSRLOOP", SDTIntShiftOp>;		def AVRlsrLoop : SDNode<"AVRISD::LSRLOOP", SDTIntShiftOp>;
def AVRrolLoop : SDNode<"AVRISD::ROLLOOP", SDTIntShiftOp>;		def AVRrolLoop : SDNode<"AVRISD::ROLLOOP", SDTIntShiftOp>;
def AVRrorLoop : SDNode<"AVRISD::RORLOOP", SDTIntShiftOp>;		def AVRrorLoop : SDNode<"AVRISD::RORLOOP", SDTIntShiftOp>;
def AVRasrLoop : SDNode<"AVRISD::ASRLOOP", SDTIntShiftOp>;		def AVRasrLoop : SDNode<"AVRISD::ASRLOOP", SDTIntShiftOp>;

▲ Show 20 Lines • Show All 2,252 Lines • ▼ Show 20 Lines	def Lsl16 : ShiftPseudo<(outs DREGS
(ins DREGS		(ins DREGS
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Lsl16 PSEUDO", [(set i16		"# Lsl16 PSEUDO", [(set i16
: $dst, (AVRlslLoop i16		: $dst, (AVRlslLoop i16
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

		def Lsl32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Lsl32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRlslw i16:$srclo, i16:$srchi, i8:$cnt))]>;

def Lsr8 : ShiftPseudo<(outs GPR8		def Lsr8 : ShiftPseudo<(outs GPR8
: $dst),		: $dst),
(ins GPR8		(ins GPR8
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Lsr8 PSEUDO", [(set i8		"# Lsr8 PSEUDO", [(set i8
: $dst, (AVRlsrLoop i8		: $dst, (AVRlsrLoop i8
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

def Lsr16 : ShiftPseudo<(outs DREGS		def Lsr16 : ShiftPseudo<(outs DREGS
: $dst),		: $dst),
(ins DREGS		(ins DREGS
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Lsr16 PSEUDO", [(set i16		"# Lsr16 PSEUDO", [(set i16
: $dst, (AVRlsrLoop i16		: $dst, (AVRlsrLoop i16
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

		def Lsr32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Lsr32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRlsrw i16:$srclo, i16:$srchi, i8:$cnt))]>;

def Rol8 : ShiftPseudo<(outs GPR8		def Rol8 : ShiftPseudo<(outs GPR8
: $dst),		: $dst),
(ins GPR8		(ins GPR8
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Rol8 PSEUDO", [(set i8		"# Rol8 PSEUDO", [(set i8
: $dst, (AVRrolLoop i8		: $dst, (AVRrolLoop i8
: $src, i8		: $src, i8
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	def Asr16 : ShiftPseudo<(outs DREGS
(ins DREGS		(ins DREGS
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Asr16 PSEUDO", [(set i16		"# Asr16 PSEUDO", [(set i16
: $dst, (AVRasrLoop i16		: $dst, (AVRasrLoop i16
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

		def Asr32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Asr32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRasrw i16:$srclo, i16:$srchi, i8:$cnt))]>;

// lowered to a copy from the zero register.		// lowered to a copy from the zero register.
let usesCustomInserter=1 in		let usesCustomInserter=1 in
def CopyZero : Pseudo<(outs GPR8:$rd), (ins), "clrz\t$rd", [(set i8:$rd, 0)]>;		def CopyZero : Pseudo<(outs GPR8:$rd), (ins), "clrz\t$rd", [(set i8:$rd, 0)]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Non-Instruction Patterns		// Non-Instruction Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/CodeGen/AVR/shift32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=avr -mattr=movw -verify-machineinstrs \| FileCheck %s

				define i32 @shl_i32_1(i32 %a) {
				; CHECK-LABEL: shl_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 1
				ret i32 %res
				}

				define i32 @shl_i32_2(i32 %a) {
				; CHECK-LABEL: shl_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 2
				ret i32 %res
				}

				; This is a special case: this shift is performed directly inside SelectionDAG
				; instead of as a custom lowering like the other shift operations.
				define i32 @shl_i32_16(i32 %a) {
				; CHECK-LABEL: shl_i32_16:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r24, r22
				; CHECK-NEXT: ldi r22, 0
				; CHECK-NEXT: ldi r23, 0
				; CHECK-NEXT: ret
				%res = shl i32 %a, 16
				ret i32 %res
				}

				; Combined with the register allocator, shift instructions can sometimes be
				; optimized away entirely. The least significant registers are simply stored
				; directly instead of moving them first.
				define void @shl_i32_16_ptr(i32 %a, ptr %ptr) {
				; CHECK-LABEL: shl_i32_16_ptr:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r30, r20
				; CHECK-NEXT: std Z+2, r22
				; CHECK-NEXT: std Z+3, r23
				; CHECK-NEXT: ldi r24, 0
				; CHECK-NEXT: ldi r25, 0
				; CHECK-NEXT: st Z, r24
				; CHECK-NEXT: std Z+1, r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 16
				store i32 %res, ptr %ptr
				ret void
				}

				define i32 @lshr_i32_1(i32 %a) {
				; CHECK-LABEL: lshr_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 1
				ret i32 %res
				}

				define i32 @lshr_i32_2(i32 %a) {
				; CHECK-LABEL: lshr_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 2
				ret i32 %res
				}

				define i32 @lshr_i32_16(i32 %a) {
				; CHECK-LABEL: lshr_i32_16:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r22, r24
				; CHECK-NEXT: ldi r24, 0
				; CHECK-NEXT: ldi r25, 0
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 16
				ret i32 %res
				}

				define i32 @ashr_i32_1(i32 %a) {
				; CHECK-LABEL: ashr_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 1
				ret i32 %res
				}

				define i32 @ashr_i32_2(i32 %a) {
				; CHECK-LABEL: ashr_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 2
				ret i32 %res
				}