This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AVR/
-
Target/
-
AVR/
1
AVRISelLowering.h
16/30
AVRISelLowering.cpp
-
AVRInstrInfo.td
-
test/CodeGen/AVR/
-
CodeGen/
-
AVR/
1/2
shift32.ll

Differential D138529

[AVR] Optimize constant 32-bit shifts
AbandonedPublic

Authored by aykevl on Nov 22 2022, 3:50 PM.

Download Raw Diff

Details

Reviewers

dylanmckay
benshi001

Summary

32-bit shift instructions were previously expanded using the default SelectionDAG expander, which meant it used 16-bit constant shifts and ORed them together. This works, but is far from optimal.

The patch here uses a custom instruction inserter to insert optimized constant 32-bit shifts. This is done using three new pseudo-instructions that take the upper and lower bits of the value in two separate 16-bit registers and outputs two 16-bit registers.

This change results in around 31% less instructions on average for constant 32-bit shifts, and is in all cases equal or better than the old behavior. It also tends to match or outperform avr-gcc: the only cases where avr-gcc does better is when it uses a loop to shift, or when the LLVM register allocator inserts some unnecessary movs. But it even outperforms avr-gcc in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9% smaller. I think picolibc is a better representation of real-world code, but even a ~1% reduction in code size is really significant.

I have tested this patch in simavr, with each of the 3 kinds of shifts and each of the 30 shift amounts that can be used.

Future patches can improve on this:

A loop can often result in reduced code size at the expense of speed. We should use a loop when the minsize attribute is set and a loop would need fewer instructions.
For some reason the register allocator inserts some unnecessary moves. I think this can be worked around by fiddling with REG_SEQUENCE, but I'm not sure. A better solution might be to try and split 16-bit instructions into 8-bit instructions before register allocation (something which I've tried a few times but haven't found a good solution for yet).
Because the main algorithm is independent of the number of registers, the same code can be used to emit moves for other shift widths (8, 16, 64, and others if needed). But this can be done in later patches if needed.

I mostly based myself on an investigation I did a while ago: https://aykevl.nl/2021/02/avr-bitshift

Diff Detail

Unit TestsFailed

	Time	Test
	8,800 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::pr33372.cpp
	10,530 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::pr33372.cpp
	50 ms	x64 debian > LLVM.CodeGen/AVR::avr-rust-issue-123.ll
	40 ms	x64 debian > LLVM.CodeGen/AVR::return.ll

Event Timeline

aykevl created this revision.Nov 22 2022, 3:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 3:50 PM

Herald added subscribers: Jim, hiraditya. · View Herald Transcript

aykevl requested review of this revision.Nov 22 2022, 3:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 3:50 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aykevl edited the summary of this revision. (Show Details)Nov 22 2022, 3:53 PM

aykevl edited the summary of this revision. (Show Details)Nov 22 2022, 4:05 PM

Harbormaster completed remote builds in B199078: Diff 477331.Nov 22 2022, 4:37 PM

fixed two tests that were failing as a result of this change
some minor other changes

Harbormaster completed remote builds in B199094: Diff 477359.Nov 22 2022, 5:57 PM

aykevl mentioned this in D138582: [AVR] Do not use R0/R1 on avrtiny.Nov 23 2022, 8:43 AM

dnelson_1901 added a subscriber: dnelson_1901.Nov 25 2022, 9:38 AM

aykevl added a child revision: D138582: [AVR] Do not use R0/R1 on avrtiny.Nov 27 2022, 6:39 AM

rebase (no change in patch) - hopefully the pre-merge check now works?

Harbormaster completed remote builds in B199654: Diff 478103.Nov 27 2022, 8:52 AM

I will have a look next week.

For your commit message, I think the front part is OK, and the other part from Future patches can improve on this: till the end needs not to be committed.

benshi001 added inline comments.Dec 4 2022, 11:40 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
286	This message should be `Expected power-of-2 shift operand`, you can also fix it along with your patch..
llvm/lib/Target/AVR/AVRISelLowering.h
42	`LSLW` is a bit confusing with `LSLWN`, so how about `LSLWP`, which means `LSLW-Pair`.

benshi001 added inline comments.Dec 5 2022, 2:30 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1959	How about using `GetZeroRegister()` instead of fixed `AVR::R1`? This way should also fit avrtiny. Or we can emit `eor Rx, Rx` instead of `mov Rx, Zero` .

benshi001 added inline comments.Dec 5 2022, 2:33 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1862	How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero .
1915	How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero .
1986	How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero .

Is it possible to break this large patch to smaller ones, in which the first one is just the skeleton (with only shiftAmount = 1 implemented as the basic situation) ? Then later ones implment each special shiftAmount case .

benshi001 added inline comments.Dec 5 2022, 4:14 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	llvm_unreachable("Expected a constant shift amount!");
302	It is OK that `ShiftAmount` have a `MVT::i8` type ? Could it be better to `MVT::i16` ?
314	Would it better to use `std::map` or equivalent but more efficient llvm utilities?

use getZeroRegister instead of AVR::R1
use AVR::sub_lo and AVR::sub_hi instead of numeric constants
optimize REG_SEQUENCE

Thank you for taking a look! I will update the patch later with your suggestions. For now, I've updated the patch a bit with changes I made locally in the past few days.
(I will also update the patch with extra context that I forgot to add this time).

llvm/lib/Target/AVR/AVRISelLowering.cpp
302	32-bit shift instructions will shift at most 31 bits, so i8 is good enough.
314	Do you know of any? Other code seems to use a `switch` and I think this is very readable.
1862	This was already fixed in my local changes (this patch is older than D138582).

In D138529#3970417, @benshi001 wrote:

Is it possible to break this large patch to smaller ones, in which the first one is just the skeleton (with only shiftAmount = 1 implemented as the basic situation) ? Then later ones implment each special shiftAmount case .

I didn't split the patch because this way, there is no code size regression. But if you prefer I can split the patch in multiple patches and commit them all at the same time once accepted.

EDIT: yeah I agree, splitting this patch is better. I'll do that in the next update.

benshi001 added inline comments.Dec 5 2022, 6:43 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp

314

Though switch is more readable, it seems boring, in my opinion.

So I suggest

std::map<unsigned, unsigned> OpMap = {{ISD::SHL, AVRISD::LSLW},
                                      {ISD::SRL, AVRISD::LSRW}, 
                                      {ISD::SRA, AVRISD::ASRW}};
assert(OpMap.find(Op.getOpcode()) != OpMap.end() &&
       "Unexpected shift opcode");
unsigned Opc = OpMap[Op.getOpcode()];

which looks more clear.

benshi001 added inline comments.Dec 5 2022, 8:04 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2146	Have your supplemented tests covered all those special conditions ?

benshi001 added inline comments.Dec 5 2022, 11:45 PM

llvm/test/CodeGen/AVR/shift32.ll
3	Also check for AVRTiny ？

benshi001 added inline comments.Dec 6 2022, 12:17 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1843	I think using negative ShiftAmt for left shift is so counter-intuitive, it would be better to use one more argument `bool leftShift`, or direct expose the Opcode `{shl, lshr, ashr}`.

benshi001 added inline comments.Dec 6 2022, 12:24 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1861	Renaming `ShiftBytes` to `ShiftRegsSize` looks better.
1906	Renaming `ShiftBytes` to `ShiftRegsSize` looks better.

benshi001 added inline comments.Dec 6 2022, 12:36 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1855	Would it be better to split this large function `insertMultibyteShift` to small ones for each `if` statment ? for example, this `if` can be a standalone function `insertMultibyteShiftMod6_7` (or any better name).

benshi001 added inline comments.Dec 7 2022, 2:54 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1934	This `if` statement is not tested for ashr. It would be better to add an ashr6 or ashr30 test.

benshi001 added inline comments.Dec 7 2022, 2:58 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1949	`Ext` and `ExtMore` seem confusing. May it be better that `Ext` -> `HighByte` `ExtMore` -> `ExtByte`

benshi001 added inline comments.Dec 7 2022, 3:07 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2008	So here should be an `assert((ShiftAmt > -8 && ShiftAmt < 8) && "Unexpect shift amount");`

benshi001 added inline comments.Dec 7 2022, 3:41 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2004	I am a bit confused about here. With `Regs = Regs.slice(1, Regs.size() - 1);`, so the 0 index element is dropped, and only 3 elements left in `Regs` ?

I created a series of patches to replace this one, for easier reviewing. Thank you for the thorough review!
See: D140569, D140570, D140571, D140572, D140573

llvm/lib/Target/AVR/AVRISelLowering.cpp
314	I'm not really convinced this is more readable. I like boring code, it means it is easy to understand.
1843	Hmm, you have a point here. But I kind of like using the signedness of the value. What do you think of renaming it to `ShiftRightAmt` instead? Then it's clearer that a negative number will be a left shift. I prefer to avoid opcodes because that would make the code less generic (I want to write a 64-bit shift later, for example).
1855	What would be the benefit of that?
1861	👍 sounds good
1934	It is, it is tested by `ashr_i32_30`. I have added a few more tests in my updated code.
1949	Fair enough. I'll update the code with these new names and some extra comments to explain what they do.
2004	Yes. I'll update the code to use `drop_front` and `drop_back` instead.
2008	👍 seems fine by me.
2146	Not sure, I did test all cases locally (all 93 cases) and this does not affect or improves code size in all cases. You can see a number of tests in D140573 that are optimized this way. In any case, this is an optimization. As long as correct code is generated in both cases, the condition is purely a heuristic.
llvm/test/CodeGen/AVR/shift32.ll
3	The only difference is the lack of `movw` (which isn't emitted by this patch, instead is uses `AVR::COPY` which is lowered to either `movw` or two `mov` instructions depending on support). All other instructions are supported by avrtiny. So I do not think testing for AVRTiny is necessary here. (I could add it of course if you think it's useful, but the test output will likely be near-identical).

aykevl mentioned this in D140569: [AVR] Custom lower 32-bit shift instructions.Dec 26 2022, 9:55 AM

You can close this patch.

Closing, this has been replaced by multiple smaller patches.

Revision Contents

Path

Size

llvm/

lib/

Target/

AVR/

AVRISelLowering.h

5 lines

AVRISelLowering.cpp

377 lines

AVRInstrInfo.td

18 lines

test/

CodeGen/

AVR/

shift32.ll

422 lines

Diff 477331

llvm/lib/Target/AVR/AVRISelLowering.h

Show All 33 Lines	enum NodeType {
CALL,		CALL,
/// A wrapper node for TargetConstantPool,		/// A wrapper node for TargetConstantPool,
/// TargetExternalSymbol, and TargetGlobalAddress.		/// TargetExternalSymbol, and TargetGlobalAddress.
WRAPPER,		WRAPPER,
LSL, ///< Logical shift left.		LSL, ///< Logical shift left.
LSLBN, ///< Byte logical shift left N bits.		LSLBN, ///< Byte logical shift left N bits.
LSLWN, ///< Word logical shift left N bits.		LSLWN, ///< Word logical shift left N bits.
LSLHI, ///< Higher 8-bit of word logical shift left.		LSLHI, ///< Higher 8-bit of word logical shift left.
		LSLW, ///< Wide logical shift left.
		benshi001Unsubmitted Not Done Reply Inline Actions `LSLW` is a bit confusing with `LSLWN`, so how about `LSLWP`, which means `LSLW-Pair`. benshi001: `LSLW` is a bit confusing with `LSLWN`, so how about `LSLWP`, which means `LSLW-Pair`.
LSR, ///< Logical shift right.		LSR, ///< Logical shift right.
LSRBN, ///< Byte logical shift right N bits.		LSRBN, ///< Byte logical shift right N bits.
LSRWN, ///< Word logical shift right N bits.		LSRWN, ///< Word logical shift right N bits.
LSRLO, ///< Lower 8-bit of word logical shift right.		LSRLO, ///< Lower 8-bit of word logical shift right.
		LSRW, ///< Wide logical shift right.
ASR, ///< Arithmetic shift right.		ASR, ///< Arithmetic shift right.
ASRBN, ///< Byte arithmetic shift right N bits.		ASRBN, ///< Byte arithmetic shift right N bits.
ASRWN, ///< Word arithmetic shift right N bits.		ASRWN, ///< Word arithmetic shift right N bits.
ASRLO, ///< Lower 8-bit of word arithmetic shift right.		ASRLO, ///< Lower 8-bit of word arithmetic shift right.
		ASRW, ///< Wide arithmetic shift right.
ROR, ///< Bit rotate right.		ROR, ///< Bit rotate right.
ROL, ///< Bit rotate left.		ROL, ///< Bit rotate left.
LSLLOOP, ///< A loop of single logical shift left instructions.		LSLLOOP, ///< A loop of single logical shift left instructions.
LSRLOOP, ///< A loop of single logical shift right instructions.		LSRLOOP, ///< A loop of single logical shift right instructions.
ROLLOOP, ///< A loop of single left bit rotate instructions.		ROLLOOP, ///< A loop of single left bit rotate instructions.
RORLOOP, ///< A loop of single right bit rotate instructions.		RORLOOP, ///< A loop of single right bit rotate instructions.
ASRLOOP, ///< A loop of single arithmetic shift right instructions.		ASRLOOP, ///< A loop of single arithmetic shift right instructions.
/// AVR conditional branches. Operand 0 is the chain operand, operand 1		/// AVR conditional branches. Operand 0 is the chain operand, operand 1
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const;		SmallVectorImpl<SDValue> &InVals) const;

protected:		protected:
const AVRSubtarget &Subtarget;		const AVRSubtarget &Subtarget;

private:		private:
MachineBasicBlock insertShift(MachineInstr &MI, MachineBasicBlock BB) const;		MachineBasicBlock insertShift(MachineInstr &MI, MachineBasicBlock BB) const;
		MachineBasicBlock *insertWideShift(MachineInstr &MI,
		MachineBasicBlock *BB) const;
MachineBasicBlock insertMul(MachineInstr &MI, MachineBasicBlock BB) const;		MachineBasicBlock insertMul(MachineInstr &MI, MachineBasicBlock BB) const;
MachineBasicBlock *insertCopyR1(MachineInstr &MI,		MachineBasicBlock *insertCopyR1(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
MachineBasicBlock *insertAtomicArithmeticOp(MachineInstr &MI,		MachineBasicBlock *insertAtomicArithmeticOp(MachineInstr &MI,
MachineBasicBlock *BB,		MachineBasicBlock *BB,
unsigned Opcode, int Width) const;		unsigned Opcode, int Width) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_AVR_ISEL_LOWERING_H		#endif // LLVM_AVR_ISEL_LOWERING_H

llvm/lib/Target/AVR/AVRISelLowering.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	AVRTargetLowering::AVRTargetLowering(const AVRTargetMachine &TM,
// our shift instructions are only able to shift 1 bit at a time, so handle		// our shift instructions are only able to shift 1 bit at a time, so handle
// this in a custom way.		// this in a custom way.
setOperationAction(ISD::SRA, MVT::i8, Custom);		setOperationAction(ISD::SRA, MVT::i8, Custom);
setOperationAction(ISD::SHL, MVT::i8, Custom);		setOperationAction(ISD::SHL, MVT::i8, Custom);
setOperationAction(ISD::SRL, MVT::i8, Custom);		setOperationAction(ISD::SRL, MVT::i8, Custom);
setOperationAction(ISD::SRA, MVT::i16, Custom);		setOperationAction(ISD::SRA, MVT::i16, Custom);
setOperationAction(ISD::SHL, MVT::i16, Custom);		setOperationAction(ISD::SHL, MVT::i16, Custom);
setOperationAction(ISD::SRL, MVT::i16, Custom);		setOperationAction(ISD::SRL, MVT::i16, Custom);
		setOperationAction(ISD::SRA, MVT::i32, Custom);
		setOperationAction(ISD::SHL, MVT::i32, Custom);
		setOperationAction(ISD::SRL, MVT::i32, Custom);
setOperationAction(ISD::SHL_PARTS, MVT::i16, Expand);		setOperationAction(ISD::SHL_PARTS, MVT::i16, Expand);
setOperationAction(ISD::SRA_PARTS, MVT::i16, Expand);		setOperationAction(ISD::SRA_PARTS, MVT::i16, Expand);
setOperationAction(ISD::SRL_PARTS, MVT::i16, Expand);		setOperationAction(ISD::SRL_PARTS, MVT::i16, Expand);

setOperationAction(ISD::ROTL, MVT::i8, Custom);		setOperationAction(ISD::ROTL, MVT::i8, Custom);
setOperationAction(ISD::ROTL, MVT::i16, Expand);		setOperationAction(ISD::ROTL, MVT::i16, Expand);
setOperationAction(ISD::ROTR, MVT::i8, Custom);		setOperationAction(ISD::ROTR, MVT::i8, Custom);
setOperationAction(ISD::ROTR, MVT::i16, Expand);		setOperationAction(ISD::ROTR, MVT::i16, Expand);
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	default:
NODE(ROLLOOP);		NODE(ROLLOOP);
NODE(RORLOOP);		NODE(RORLOOP);
NODE(ASRLOOP);		NODE(ASRLOOP);
NODE(BRCOND);		NODE(BRCOND);
NODE(CMP);		NODE(CMP);
NODE(CMPC);		NODE(CMPC);
NODE(TST);		NODE(TST);
NODE(SELECT_CC);		NODE(SELECT_CC);
		NODE(LSLW);
		NODE(LSRW);
		NODE(ASRW);
#undef NODE		#undef NODE
}		}
}		}

EVT AVRTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &,		EVT AVRTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &,
EVT VT) const {		EVT VT) const {
assert(!VT.isVector() && "No AVR SetCC type for vectors!");		assert(!VT.isVector() && "No AVR SetCC type for vectors!");
return MVT::i8;		return MVT::i8;
}		}

SDValue AVRTargetLowering::LowerShifts(SDValue Op, SelectionDAG &DAG) const {		SDValue AVRTargetLowering::LowerShifts(SDValue Op, SelectionDAG &DAG) const {
unsigned Opc8;		unsigned Opc8;
const SDNode *N = Op.getNode();		const SDNode *N = Op.getNode();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc dl(N);		SDLoc dl(N);
assert(isPowerOf2_32(VT.getSizeInBits()) &&		assert(isPowerOf2_32(VT.getSizeInBits()) &&
"Expected power-of-2 shift amount");		"Expected power-of-2 shift amount");
		benshi001Unsubmitted Not Done Reply Inline Actions This message should be `Expected power-of-2 shift operand`, you can also fix it along with your patch.. benshi001: This message should be `Expected power-of-2 shift operand`, you can also fix it along with your…

		if (VT.getSizeInBits() == 32) {
		if (!isa<ConstantSDNode>(N->getOperand(1))) {
		// 32-bit shifts are converted to a loop in IR.
		llvm_unreachable("Expected a constant shift!");
		benshi001Unsubmitted Not Done Reply Inline Actions llvm_unreachable("Expected a constant shift amount!"); benshi001: ``` llvm_unreachable("Expected a constant shift amount!"); ```
		}
		SDVTList ResTys = DAG.getVTList(MVT::i16, MVT::i16);
		SDValue SrcLo =
		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
		DAG.getConstant(0, dl, MVT::i16));
		SDValue SrcHi =
		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
		DAG.getConstant(1, dl, MVT::i16));
		uint64_t ShiftAmount =
		cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
		SDValue Cnt = DAG.getTargetConstant(ShiftAmount, dl, MVT::i8);
		benshi001Unsubmitted Not Done Reply Inline Actions It is OK that `ShiftAmount` have a `MVT::i8` type ? Could it be better to `MVT::i16` ? benshi001: It is OK that `ShiftAmount` have a `MVT::i8` type ? Could it be better to `MVT::i16` ?
		aykevlAuthorUnsubmitted Done Reply Inline Actions 32-bit shift instructions will shift at most 31 bits, so i8 is good enough. aykevl: 32-bit shift instructions will shift at most 31 bits, so i8 is good enough.
		unsigned Opc;
		switch (Op.getOpcode()) {
		default:
		llvm_unreachable("Invalid 32-bit shift opcode!");
		case ISD::SHL:
		Opc = AVRISD::LSLW;
		break;
		case ISD::SRL:
		Opc = AVRISD::LSRW;
		break;
		case ISD::SRA:
		Opc = AVRISD::ASRW;
		benshi001Unsubmitted Not Done Reply Inline Actions Would it better to use `std::map` or equivalent but more efficient llvm utilities? benshi001: Would it better to use `std::map` or equivalent but more efficient llvm utilities?
		aykevlAuthorUnsubmitted Done Reply Inline Actions Do you know of any? Other code seems to use a `switch` and I think this is very readable. aykevl: Do you know of any? Other code seems to use a `switch` and I think this is very readable.
		benshi001Unsubmitted Not Done Reply Inline Actions Though `switch` is more readable, it seems boring, in my opinion. So I suggest std::map<unsigned, unsigned> OpMap = {{ISD::SHL, AVRISD::LSLW}, {ISD::SRL, AVRISD::LSRW}, {ISD::SRA, AVRISD::ASRW}}; assert(OpMap.find(Op.getOpcode()) != OpMap.end() && "Unexpected shift opcode"); unsigned Opc = OpMap[Op.getOpcode()]; which looks more clear. benshi001: Though `switch` is more readable, it seems boring, in my opinion. So I suggest ``` std…
		aykevlAuthorUnsubmitted Done Reply Inline Actions I'm not really convinced this is more readable. I like boring code, it means it is easy to understand. aykevl: I'm not really convinced this is more readable. I like boring code, it means it is easy to…
		break;
		}
		SDValue Result = DAG.getNode(Opc, dl, ResTys, SrcLo, SrcHi, Cnt);
		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, Result.getValue(0),
		Result.getValue(1));
		}

// Expand non-constant shifts to loops.		// Expand non-constant shifts to loops.
if (!isa<ConstantSDNode>(N->getOperand(1))) {		if (!isa<ConstantSDNode>(N->getOperand(1))) {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
llvm_unreachable("Invalid shift opcode!");		llvm_unreachable("Invalid shift opcode!");
case ISD::SHL:		case ISD::SHL:
return DAG.getNode(AVRISD::LSLLOOP, dl, VT, N->getOperand(0),		return DAG.getNode(AVRISD::LSLLOOP, dl, VT, N->getOperand(0),
N->getOperand(1));		N->getOperand(1));
▲ Show 20 Lines • Show All 1,486 Lines • ▼ Show 20 Lines	MachineBasicBlock *AVRTargetLowering::insertShift(MachineInstr &MI,

BuildMI(CheckBB, dl, TII.get(AVR::DECRd), ShiftAmtReg2).addReg(ShiftAmtReg);		BuildMI(CheckBB, dl, TII.get(AVR::DECRd), ShiftAmtReg2).addReg(ShiftAmtReg);
BuildMI(CheckBB, dl, TII.get(AVR::BRPLk)).addMBB(LoopBB);		BuildMI(CheckBB, dl, TII.get(AVR::BRPLk)).addMBB(LoopBB);

MI.eraseFromParent(); // The pseudo instruction is gone now.		MI.eraseFromParent(); // The pseudo instruction is gone now.
return RemBB;		return RemBB;
}		}

		// Do a multibyte AVR shift. Insert shift instructions and put the output
		// registers in the Regs array.
		// Because AVR does not have a normal shift instruction (only a single bit shift
		// instruction), we have to emulate this behavior with other instructions.
		// It first tries large steps (moving registers around) and then smaller steps
		// like single bit shifts.
		// Large shifts actually reduce the number of shifted registers, so the below
		// algorithms have to work independently of the number of registers that are
		// shifted.
		// For more information and background, see this blogpost:
		// https://aykevl.nl/2021/02/avr-bitshift
		static void insertMultibyteShift(MachineInstr &MI, MachineBasicBlock *BB, MutableArrayRef<std::pair<Register, int>> Regs, int64_t ShiftAmt, bool ArithmeticShift) {
		const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
		DebugLoc dl = MI.getDebugLoc();

		// Do a shift modulo 6 or 7. This is a bit more complicated than most shifts
		// and is hard to compose with the rest, so these are special cased.
		// The basic idea is to shift one or two bits in the opposite direction and
		// then move registers around to get the correct end result.
		benshi001Unsubmitted Not Done Reply Inline Actions I think using negative ShiftAmt for left shift is so counter-intuitive, it would be better to use one more argument `bool leftShift`, or direct expose the Opcode `{shl, lshr, ashr}`. benshi001: I think using negative ShiftAmt for left shift is so counter-intuitive, it would be better to…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Hmm, you have a point here. But I kind of like using the signedness of the value. What do you think of renaming it to `ShiftRightAmt` instead? Then it's clearer that a negative number will be a left shift. I prefer to avoid opcodes because that would make the code less generic (I want to write a 64-bit shift later, for example). aykevl: Hmm, you have a point here. But I kind of like using the signedness of the value. What do you…
		if (ShiftAmt < 0 && (-ShiftAmt % 8) >= 6) {
		// Left shift modulo 6 or 7.

		// Create a slice of the registers we're going to modify, to ease working
		// with them.
		size_t ShiftRegsOffset = -ShiftAmt / 8;
		size_t ShiftBytes = Regs.size() - ShiftRegsOffset;
		MutableArrayRef<std::pair<Register, int>> ShiftRegs =
		Regs.slice(ShiftRegsOffset, ShiftBytes);

		// Shift one to the right, keeping the least significant bit as the carry
		// bit.
		benshi001Unsubmitted Not Done Reply Inline Actions Would it be better to split this large function `insertMultibyteShift` to small ones for each `if` statment ? for example, this `if` can be a standalone function `insertMultibyteShiftMod6_7` (or any better name). benshi001: Would it be better to split this large function `insertMultibyteShift` to small ones for each…
		aykevlAuthorUnsubmitted Done Reply Inline Actions What would be the benefit of that? aykevl: What would be the benefit of that?
		insertMultibyteShift(MI, BB, ShiftRegs, 1, false);

		// Create zero register.
		Register Zero = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), Zero).addReg(AVR::R1);

		benshi001Unsubmitted Not Done Reply Inline Actions Renaming `ShiftBytes` to `ShiftRegsSize` looks better. benshi001: Renaming `ShiftBytes` to `ShiftRegsSize` looks better.
		aykevlAuthorUnsubmitted Done Reply Inline Actions 👍 sounds good aykevl: :+1: sounds good
		// Rotate the least significant bit from the carry bit into a new register
		benshi001Unsubmitted Done Reply Inline Actions How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero . benshi001: How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny.
		aykevlAuthorUnsubmitted Done Reply Inline Actions This was already fixed in my local changes (this patch is older than D138582). aykevl: This was already fixed in my local changes (this patch is older than D138582).
		// (that starts out zero).
		Register LowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), LowByte).addReg(Zero);

		// Shift one more to the right if this is a modulo-6 shift.
		if (-ShiftAmt % 8 == 6) {
		insertMultibyteShift(MI, BB, ShiftRegs, 1, false);
		Register NewLowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), NewLowByte).addReg(LowByte);
		LowByte = NewLowByte;
		}

		// Move all registers to the left, zeroing the bottom registers as needed.
		for (size_t I = 0; I < Regs.size(); I++) {
		int Idx = I + 1;
		if (Idx < (int)ShiftRegs.size()) {
		Regs[I] = ShiftRegs[Idx];
		} else if (Idx == (int)ShiftRegs.size()) {
		Regs[I] = std::pair(LowByte, 0);
		} else {
		Regs[I] = std::pair(Zero, 0);
		}
		}

		return;
		}

		// Right shift modulo 6 or 7.
		if (ShiftAmt > 0 && (ShiftAmt % 8) >= 6) {
		// Create a view on the registers we're going to modify, to ease working
		// with them.
		size_t ShiftBytes = Regs.size() - (ShiftAmt / 8);
		MutableArrayRef<std::pair<Register, int>> ShiftRegs =
		Regs.slice(0, ShiftBytes);

		// Shift one to the left.
		insertMultibyteShift(MI, BB, ShiftRegs, -1, false);

		// Sign or zero extend the most significant register into a new register.
		Register Ext = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register ExtMore = 0;
		if (ArithmeticShift) {
		// Sign-extend bit that was shifted out last.
		BuildMI(*BB, MI, dl, TII.get(AVR::SBCRdRr), Ext)
		benshi001Unsubmitted Not Done Reply Inline Actions Renaming `ShiftBytes` to `ShiftRegsSize` looks better. benshi001: Renaming `ShiftBytes` to `ShiftRegsSize` looks better.
		.addReg(Ext, RegState::Undef)
		.addReg(Ext, RegState::Undef);
		ExtMore = Ext;
		} else {
		// Create a new zero register for zero extending.
		ExtMore = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), ExtMore).addReg(AVR::R1);
		// Rotate most significant bit into a new register (that starts out zero).
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), Ext)
		benshi001Unsubmitted Done Reply Inline Actions How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero . benshi001: How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny.
		.addReg(ExtMore)
		.addReg(ExtMore);
		}

		// Shift one more to the left for modulo 6 shifts.
		if (ShiftAmt % 8 == 6) {
		insertMultibyteShift(MI, BB, ShiftRegs, -1, false);
		Register NewExt = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), NewExt)
		.addReg(Ext)
		.addReg(Ext);
		Ext = NewExt;
		}

		// Move all to the right, while sign or zero extending.
		for (int I = Regs.size() - 1; I >= 0; I--) {
		int Idx = I - (Regs.size() - ShiftRegs.size()) - 1;
		if (Idx >= 0) {
		Regs[I] = ShiftRegs[Idx];
		benshi001Unsubmitted Not Done Reply Inline Actions This `if` statement is not tested for ashr. It would be better to add an ashr6 or ashr30 test. benshi001: This `if` statement is not tested for ashr. It would be better to add an ashr6 or ashr30 test.
		aykevlAuthorUnsubmitted Done Reply Inline Actions It is, it is tested by `ashr_i32_30`. I have added a few more tests in my updated code. aykevl: It is, it is tested by `ashr_i32_30`. I have added a few more tests in my updated code.
		} else if (Idx == -1) {
		Regs[I] = std::pair(Ext, 0);
		} else {
		Regs[I] = std::pair(ExtMore, 0);
		}
		}

		return;
		}

		// For shift amounts of at least one register, simply rename the registers and
		// zero the bottom registers.
		auto MSBReg = Regs[0];
		Register ShrExtendReg = 0;
		while (ShiftAmt <= -8) {
		benshi001Unsubmitted Not Done Reply Inline Actions `Ext` and `ExtMore` seem confusing. May it be better that `Ext` -> `HighByte` `ExtMore` -> `ExtByte` benshi001: `Ext` and `ExtMore` seem confusing. May it be better that `Ext` -> `HighByte` `ExtMore` ->…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Fair enough. I'll update the code with these new names and some extra comments to explain what they do. aykevl: Fair enough. I'll update the code with these new names and some extra comments to explain what…
		// Move all registers one to the left.
		for (size_t I = 0; I < Regs.size() - 1; I++) {
		Regs[I] = Regs[I + 1];
		}

		// Zero the least significant register.
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), Out).addReg(AVR::R1);
		Regs[Regs.size() - 1] = std::pair(Out, 0);

		benshi001Unsubmitted Done Reply Inline Actions How about using `GetZeroRegister()` instead of fixed `AVR::R1`? This way should also fit avrtiny. Or we can emit `eor Rx, Rx` instead of `mov Rx, Zero` . benshi001: How about using `GetZeroRegister()` instead of fixed `AVR::R1`? This way should also fit…
		// Continue shifts with the leftover registers.
		Regs = Regs.slice(0, Regs.size() - 1);

		ShiftAmt += 8;
		}
		while (ShiftAmt >= 8) {
		// Move all registers one to the right.
		for (size_t I = Regs.size() - 1; I != 0; I--) {
		Regs[I] = Regs[I - 1];
		}

		// Zero or sign extend the most significant register.
		if (ShrExtendReg == 0) {
		ShrExtendReg = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		if (ArithmeticShift) {
		// Sign extend the most significant register into ShrExtendReg.
		Register Tmp = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Tmp)
		.addReg(MSBReg.first, 0, MSBReg.second)
		.addReg(MSBReg.first, 0, MSBReg.second);
		BuildMI(*BB, MI, dl, TII.get(AVR::SBCRdRr), ShrExtendReg)
		.addReg(Tmp)
		.addReg(Tmp);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), ShrExtendReg).addReg(AVR::R1);
		}
		}
		benshi001Unsubmitted Done Reply Inline Actions How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero . benshi001: How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny.
		Regs[0] = std::pair(ShrExtendReg, 0);

		// Continue shifts with the leftover registers.
		Regs = Regs.slice(1, Regs.size() - 1);

		ShiftAmt -= 8;
		}

		// Shift by four bits, using a complicated swap/eor/andi/eor sequence.
		// It only works for logical shifts because the bits shifted in are all
		// zeroes.
		// Example shifting 16 bits (2 bytes):
		//
		// ; shift r1
		// swap r1
		// andi r1, 0xf0
		// ; shift r0
		// swap r0
		benshi001Unsubmitted Not Done Reply Inline Actions I am a bit confused about here. With `Regs = Regs.slice(1, Regs.size() - 1);`, so the 0 index element is dropped, and only 3 elements left in `Regs` ? benshi001: I am a bit confused about here. With `Regs = Regs.slice(1, Regs.size() - 1);`, so the 0 index…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Yes. I'll update the code to use `drop_front` and `drop_back` instead. aykevl: Yes. I'll update the code to use `drop_front` and `drop_back` instead.
		// eor r1, r0
		// andi r0, 0xf0
		// eor r1, r0
		if (!ArithmeticShift && (ShiftAmt <= -4 \|\| ShiftAmt >= 4)) {
		benshi001Unsubmitted Not Done Reply Inline Actions So here should be an `assert((ShiftAmt > -8 && ShiftAmt < 8) && "Unexpect shift amount");` benshi001: So here should be an `assert((ShiftAmt > -8 && ShiftAmt < 8) && "Unexpect shift amount");`
		aykevlAuthorUnsubmitted Done Reply Inline Actions 👍 seems fine by me. aykevl: :+1: seems fine by me.
		Register Prev = 0;
		for (size_t i = 0; i < Regs.size(); i++) {
		size_t Idx = (ShiftAmt < 0) ? i : Regs.size() - i - 1;
		Register SwapReg = MRI.createVirtualRegister(&AVR::LD8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::SWAPRd), SwapReg)
		.addReg(Regs[Idx].first, 0, Regs[Idx].second);
		if (Prev != 0) {
		Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::EORRdRr), R)
		.addReg(Prev)
		.addReg(SwapReg);
		Prev = R;
		}
		Register AndReg = MRI.createVirtualRegister(&AVR::LD8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::ANDIRdK), AndReg)
		.addReg(SwapReg)
		.addImm((ShiftAmt < 0) ? 0xf0 : 0x0f);
		if (Prev != 0) {
		Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::EORRdRr), R)
		.addReg(Prev)
		.addReg(AndReg);
		if (ShiftAmt < 0) { // left shift
		Regs[Idx - 1] = std::pair(R, 0);
		} else { // right shift
		Regs[Idx + 1] = std::pair(R, 0);
		}
		}
		Prev = AndReg;
		Regs[Idx] = std::pair(AndReg, 0);
		}
		if (ShiftAmt < 0) {
		ShiftAmt += 4;
		} else {
		ShiftAmt -= 4;
		}
		}

		// Shift by one. This is the fallback that always works, and the shift
		// operation that is used for 1, 2, and 3 bit shifts.
		while (ShiftAmt < 0) {
		// Shift one to the left.
		for (size_t i = 0; i < Regs.size(); i++) {
		size_t Idx = Regs.size() - i - 1;
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register In = Regs[Idx].first;
		Register InSubreg = Regs[Idx].second;
		if (i == 0) {
		BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Out)
		.addReg(In, 0, InSubreg)
		.addReg(In, 0, InSubreg);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), Out)
		.addReg(In, 0, InSubreg)
		.addReg(In, 0, InSubreg);
		}
		Regs[Idx] = std::pair(Out, 0);
		}
		ShiftAmt++;
		}
		while (ShiftAmt > 0) {
		// Shift one to the right.
		for (size_t i = 0; i < Regs.size(); i++) {
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register In = Regs[i].first;
		Register InSubreg = Regs[i].second;
		if (i == 0) {
		unsigned Opc = ArithmeticShift ? AVR::ASRRd : AVR::LSRRd;
		BuildMI(*BB, MI, dl, TII.get(Opc), Out).addReg(In, 0, InSubreg);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), Out).addReg(In, 0, InSubreg);
		}
		Regs[i] = std::pair(Out, 0);
		}
		ShiftAmt--;
		}

		if (ShiftAmt != 0) {
		llvm_unreachable("don't know how to shift!"); // sanity check
		}
		}

		// Do a wide (32-bit) shift.
		MachineBasicBlock *
		AVRTargetLowering::insertWideShift(MachineInstr &MI,
		MachineBasicBlock *BB) const {
		const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
		DebugLoc dl = MI.getDebugLoc();

		// How much to shift to the right (meaning: a negative number indicates a left
		// shift).
		int64_t ShiftAmt = MI.getOperand(4).getImm();
		bool ArithmeticShift = false;
		switch (MI.getOpcode()) {
		case AVR::Lsl32:
		ShiftAmt = -ShiftAmt;
		break;
		case AVR::Asr32:
		ArithmeticShift = true;
		break;
		}

		// Read the input registers, with the most significant register at index 0.
		SmallVector<std::pair<Register, int>, 4> Registers;
		Registers.push_back(std::pair(MI.getOperand(3).getReg(), 1));
		Registers.push_back(std::pair(MI.getOperand(3).getReg(), 2));
		Registers.push_back(std::pair(MI.getOperand(2).getReg(), 1));
		Registers.push_back(std::pair(MI.getOperand(2).getReg(), 2));

		// Do the shift. The registers are modified in-place.
		insertMultibyteShift(MI, BB, Registers, ShiftAmt, ArithmeticShift);

		// Combine the 8-bit registers into 16-bit register pairs.
		// For some reason, some right-shift instructions result in better register
		// allocation with the sequence reversed.
		// If we ever start splitting 16-bit pseudo instructions into 8-bit
		// instructions before register allocation, this workaround probably becomes
		// unnecessary.
		if (MI.getOpcode() != AVR::Lsl32 && MI.getOperand(4).getImm() < 16) {
		// Works better with a right shift where registers are moved once.
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())
		.addReg(Registers[1].first, 0, Registers[1].second)
		.addImm(2)
		.addReg(Registers[0].first, 0, Registers[0].second)
		.addImm(1);
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())
		.addReg(Registers[3].first, 0, Registers[3].second)
		.addImm(2)
		.addReg(Registers[2].first, 0, Registers[2].second)
		.addImm(1);
		} else {
		// Works better in some other cases.
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())
		.addReg(Registers[0].first, 0, Registers[0].second)
		.addImm(1)
		.addReg(Registers[1].first, 0, Registers[1].second)
		.addImm(2);
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())
		benshi001Unsubmitted Not Done Reply Inline Actions Have your supplemented tests covered all those special conditions ? benshi001: Have your supplemented tests covered all those special conditions ?
		aykevlAuthorUnsubmitted Done Reply Inline Actions Not sure, I did test all cases locally (all 93 cases) and this does not affect or improves code size in all cases. You can see a number of tests in D140573 that are optimized this way. In any case, this is an optimization. As long as correct code is generated in both cases, the condition is purely a heuristic. aykevl: Not sure, I did test all cases locally (all 93 cases) and this does not affect or improves code…
		.addReg(Registers[2].first, 0, Registers[2].second)
		.addImm(1)
		.addReg(Registers[3].first, 0, Registers[3].second)
		.addImm(2);
		}

		MI.eraseFromParent(); // The pseudo instruction is gone now.
		return BB;
		}

static bool isCopyMulResult(MachineBasicBlock::iterator const &I) {		static bool isCopyMulResult(MachineBasicBlock::iterator const &I) {
if (I->getOpcode() == AVR::COPY) {		if (I->getOpcode() == AVR::COPY) {
Register SrcReg = I->getOperand(1).getReg();		Register SrcReg = I->getOperand(1).getReg();
return (SrcReg == AVR::R0 \|\| SrcReg == AVR::R1);		return (SrcReg == AVR::R0 \|\| SrcReg == AVR::R1);
}		}

return false;		return false;
}		}
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	AVRTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
case AVR::Lsr16:		case AVR::Lsr16:
case AVR::Rol8:		case AVR::Rol8:
case AVR::Rol16:		case AVR::Rol16:
case AVR::Ror8:		case AVR::Ror8:
case AVR::Ror16:		case AVR::Ror16:
case AVR::Asr8:		case AVR::Asr8:
case AVR::Asr16:		case AVR::Asr16:
return insertShift(MI, MBB);		return insertShift(MI, MBB);
		case AVR::Lsl32:
		case AVR::Lsr32:
		case AVR::Asr32:
		return insertWideShift(MI, MBB);
case AVR::MULRdRr:		case AVR::MULRdRr:
case AVR::MULSRdRr:		case AVR::MULSRdRr:
return insertMul(MI, MBB);		return insertMul(MI, MBB);
case AVR::CopyR1:		case AVR::CopyR1:
return insertCopyR1(MI, MBB);		return insertCopyR1(MI, MBB);
case AVR::AtomicLoadAdd8:		case AVR::AtomicLoadAdd8:
return insertAtomicArithmeticOp(MI, MBB, AVR::ADDRdRr, 8);		return insertAtomicArithmeticOp(MI, MBB, AVR::ADDRdRr, 8);
case AVR::AtomicLoadAdd16:		case AVR::AtomicLoadAdd16:
▲ Show 20 Lines • Show All 462 Lines • Show Last 20 Lines

llvm/lib/Target/AVR/AVRInstrInfo.td

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
def AVRlsrlo : SDNode<"AVRISD::LSRLO", SDTIntUnaryOp>;		def AVRlsrlo : SDNode<"AVRISD::LSRLO", SDTIntUnaryOp>;
def AVRasrlo : SDNode<"AVRISD::ASRLO", SDTIntUnaryOp>;		def AVRasrlo : SDNode<"AVRISD::ASRLO", SDTIntUnaryOp>;
def AVRlslbn : SDNode<"AVRISD::LSLBN", SDTIntBinOp>;		def AVRlslbn : SDNode<"AVRISD::LSLBN", SDTIntBinOp>;
def AVRlsrbn : SDNode<"AVRISD::LSRBN", SDTIntBinOp>;		def AVRlsrbn : SDNode<"AVRISD::LSRBN", SDTIntBinOp>;
def AVRasrbn : SDNode<"AVRISD::ASRBN", SDTIntBinOp>;		def AVRasrbn : SDNode<"AVRISD::ASRBN", SDTIntBinOp>;
def AVRlslwn : SDNode<"AVRISD::LSLWN", SDTIntBinOp>;		def AVRlslwn : SDNode<"AVRISD::LSLWN", SDTIntBinOp>;
def AVRlsrwn : SDNode<"AVRISD::LSRWN", SDTIntBinOp>;		def AVRlsrwn : SDNode<"AVRISD::LSRWN", SDTIntBinOp>;
def AVRasrwn : SDNode<"AVRISD::ASRWN", SDTIntBinOp>;		def AVRasrwn : SDNode<"AVRISD::ASRWN", SDTIntBinOp>;
		def AVRlslw : SDNode<"AVRISD::LSLW", SDTIntShiftDOp>;
		def AVRlsrw : SDNode<"AVRISD::LSRW", SDTIntShiftDOp>;
		def AVRasrw : SDNode<"AVRISD::ASRW", SDTIntShiftDOp>;

// Pseudo shift nodes for non-constant shift amounts.		// Pseudo shift nodes for non-constant shift amounts.
def AVRlslLoop : SDNode<"AVRISD::LSLLOOP", SDTIntShiftOp>;		def AVRlslLoop : SDNode<"AVRISD::LSLLOOP", SDTIntShiftOp>;
def AVRlsrLoop : SDNode<"AVRISD::LSRLOOP", SDTIntShiftOp>;		def AVRlsrLoop : SDNode<"AVRISD::LSRLOOP", SDTIntShiftOp>;
def AVRrolLoop : SDNode<"AVRISD::ROLLOOP", SDTIntShiftOp>;		def AVRrolLoop : SDNode<"AVRISD::ROLLOOP", SDTIntShiftOp>;
def AVRrorLoop : SDNode<"AVRISD::RORLOOP", SDTIntShiftOp>;		def AVRrorLoop : SDNode<"AVRISD::RORLOOP", SDTIntShiftOp>;
def AVRasrLoop : SDNode<"AVRISD::ASRLOOP", SDTIntShiftOp>;		def AVRasrLoop : SDNode<"AVRISD::ASRLOOP", SDTIntShiftOp>;

▲ Show 20 Lines • Show All 2,222 Lines • ▼ Show 20 Lines	def Lsl16 : ShiftPseudo<(outs DREGS
(ins DREGS		(ins DREGS
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Lsl16 PSEUDO", [(set i16		"# Lsl16 PSEUDO", [(set i16
: $dst, (AVRlslLoop i16		: $dst, (AVRlslLoop i16
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

		def Lsl32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Lsl32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRlslw i16:$srclo, i16:$srchi, i8:$cnt))]>;

def Lsr8 : ShiftPseudo<(outs GPR8		def Lsr8 : ShiftPseudo<(outs GPR8
: $dst),		: $dst),
(ins GPR8		(ins GPR8
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Lsr8 PSEUDO", [(set i8		"# Lsr8 PSEUDO", [(set i8
: $dst, (AVRlsrLoop i8		: $dst, (AVRlsrLoop i8
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

def Lsr16 : ShiftPseudo<(outs DREGS		def Lsr16 : ShiftPseudo<(outs DREGS
: $dst),		: $dst),
(ins DREGS		(ins DREGS
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Lsr16 PSEUDO", [(set i16		"# Lsr16 PSEUDO", [(set i16
: $dst, (AVRlsrLoop i16		: $dst, (AVRlsrLoop i16
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

		def Lsr32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Lsr32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRlsrw i16:$srclo, i16:$srchi, i8:$cnt))]>;

def Rol8 : ShiftPseudo<(outs GPR8		def Rol8 : ShiftPseudo<(outs GPR8
: $dst),		: $dst),
(ins GPR8		(ins GPR8
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Rol8 PSEUDO", [(set i8		"# Rol8 PSEUDO", [(set i8
: $dst, (AVRrolLoop i8		: $dst, (AVRrolLoop i8
: $src, i8		: $src, i8
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	def Asr16 : ShiftPseudo<(outs DREGS
(ins DREGS		(ins DREGS
: $src, GPR8		: $src, GPR8
: $cnt),		: $cnt),
"# Asr16 PSEUDO", [(set i16		"# Asr16 PSEUDO", [(set i16
: $dst, (AVRasrLoop i16		: $dst, (AVRasrLoop i16
: $src, i8		: $src, i8
: $cnt))]>;		: $cnt))]>;

		def Asr32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Asr32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRasrw i16:$srclo, i16:$srchi, i8:$cnt))]>;

// lowered to a copy from R1, which contains the value zero.		// lowered to a copy from R1, which contains the value zero.
let usesCustomInserter=1 in		let usesCustomInserter=1 in
def CopyR1 : Pseudo<(outs GPR8:$rd), (ins), "clrz\t$rd", [(set i8:$rd, 0)]>;		def CopyR1 : Pseudo<(outs GPR8:$rd), (ins), "clrz\t$rd", [(set i8:$rd, 0)]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Non-Instruction Patterns		// Non-Instruction Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/CodeGen/AVR/shift32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=avr -mattr=movw -verify-machineinstrs \| FileCheck %s

				benshi001Unsubmitted Not Done Reply Inline Actions Also check for AVRTiny ？ benshi001: Also check for AVRTiny ？
				aykevlAuthorUnsubmitted Done Reply Inline Actions The only difference is the lack of `movw` (which isn't emitted by this patch, instead is uses `AVR::COPY` which is lowered to either `movw` or two `mov` instructions depending on support). All other instructions are supported by avrtiny. So I do not think testing for AVRTiny is necessary here. (I could add it of course if you think it's useful, but the test output will likely be near-identical). aykevl: The only difference is the lack of `movw` (which isn't emitted by this patch, instead is uses…
				; Lowering of constant 32-bit shift instructions.
				; The main reason these functions are tested separate from shift.ll is because
				; of update_llc_test_checks.py.

				define i32 @shl_i32_1(i32 %a) {
				; CHECK-LABEL: shl_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 1
				ret i32 %res
				}

				define i32 @shl_i32_2(i32 %a) {
				; CHECK-LABEL: shl_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 2
				ret i32 %res
				}

				define i32 @shl_i32_4(i32 %a) {
				; CHECK-LABEL: shl_i32_4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r25
				; CHECK-NEXT: andi r25, 240
				; CHECK-NEXT: swap r24
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: andi r24, 240
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: swap r23
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: andi r23, 240
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: andi r22, 240
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: ret
				%res = shl i32 %a, 4
				ret i32 %res
				}

				define i32 @shl_i32_5(i32 %a) {
				; CHECK-LABEL: shl_i32_5:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r25
				; CHECK-NEXT: andi r25, 240
				; CHECK-NEXT: swap r24
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: andi r24, 240
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: swap r23
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: andi r23, 240
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: andi r22, 240
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 5
				ret i32 %res
				}

				define i32 @shl_i32_6(i32 %a) {
				; CHECK-LABEL: shl_i32_6:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: mov r18, r1
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: mov r25, r24
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r19, r22
				; CHECK-NEXT: movw r22, r18
				; CHECK-NEXT: ret
				%res = shl i32 %a, 6
				ret i32 %res
				}


				define i32 @shl_i32_7(i32 %a) {
				; CHECK-LABEL: shl_i32_7:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: mov r18, r1
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: mov r25, r24
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r19, r22
				; CHECK-NEXT: movw r22, r18
				; CHECK-NEXT: ret
				%res = shl i32 %a, 7
				ret i32 %res
				}

				define i32 @shl_i32_8(i32 %a) {
				; CHECK-LABEL: shl_i32_8:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r25, r24
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r23, r22
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: ret
				%res = shl i32 %a, 8
				ret i32 %res
				}

				define i32 @shl_i32_15(i32 %a) {
				; CHECK-LABEL: shl_i32_15:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r18, r22
				; CHECK-NEXT: lsr r24
				; CHECK-NEXT: ror r19
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = shl i32 %a, 15
				ret i32 %res
				}

				; Combined with the register allocator, shift instructions can sometimes be
				; optimized away entirely. The least significant registers are simply stored
				; directly instead of moving them first.
				; TODO: the `mov Rd, r1` instructions are needed because most of the
				; instructions are 16-bits and instructions are only split after register
				; allocation. These two instructions could be avoided if the 16-bit store
				; instruction was split into two 8-bit store instructions before register
				; allocation. That would make this shift a no-op.
				define void @shl_i32_16_ptr(i32 %a, ptr %ptr) {
				; CHECK-LABEL: shl_i32_16_ptr:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: movw r30, r20
				; CHECK-NEXT: std Z+2, r22
				; CHECK-NEXT: std Z+3, r23
				; CHECK-NEXT: st Z, r24
				; CHECK-NEXT: std Z+1, r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 16
				store i32 %res, ptr %ptr
				ret void
				}

				define i32 @shl_i32_28(i32 %a) {
				; CHECK-LABEL: shl_i32_28:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: andi r22, 240
				; CHECK-NEXT: mov r25, r22
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: ret
				%res = shl i32 %a, 28
				ret i32 %res
				}

				define i32 @shl_i32_31(i32 %a) {
				; CHECK-LABEL: shl_i32_31:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r22
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: ror r25
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: ret
				%res = shl i32 %a, 31
				ret i32 %res
				}

				define i32 @lshr_i32_1(i32 %a) {
				; CHECK-LABEL: lshr_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 1
				ret i32 %res
				}

				define i32 @lshr_i32_2(i32 %a) {
				; CHECK-LABEL: lshr_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 2
				ret i32 %res
				}

				define i32 @lshr_i32_4(i32 %a) {
				; CHECK-LABEL: lshr_i32_4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: andi r22, 15
				; CHECK-NEXT: swap r23
				; CHECK-NEXT: eor r22, r23
				; CHECK-NEXT: andi r23, 15
				; CHECK-NEXT: eor r22, r23
				; CHECK-NEXT: swap r24
				; CHECK-NEXT: eor r23, r24
				; CHECK-NEXT: andi r24, 15
				; CHECK-NEXT: eor r23, r24
				; CHECK-NEXT: swap r25
				; CHECK-NEXT: eor r24, r25
				; CHECK-NEXT: andi r25, 15
				; CHECK-NEXT: eor r24, r25
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 4
				ret i32 %res
				}

				; TODO: this could be optimized to 4 movs, instead of five.
				define i32 @lshr_i32_8(i32 %a) {
				; CHECK-LABEL: lshr_i32_8:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: mov r19, r1
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 8
				ret i32 %res
				}

				define i32 @lshr_i32_31(i32 %a) {
				; CHECK-LABEL: lshr_i32_31:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: rol r22
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 31
				ret i32 %res
				}

				define i32 @ashr_i32_1(i32 %a) {
				; CHECK-LABEL: ashr_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 1
				ret i32 %res
				}

				define i32 @ashr_i32_2(i32 %a) {
				; CHECK-LABEL: ashr_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 2
				ret i32 %res
				}

				; can't use the swap/andi/eor trick here
				define i32 @ashr_i32_4(i32 %a) {
				; CHECK-LABEL: ashr_i32_4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 4
				ret i32 %res
				}

				define i32 @ashr_i32_7(i32 %a) {
				; CHECK-LABEL: ashr_i32_7:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: sbc r19, r19
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 7
				ret i32 %res
				}

				define i32 @ashr_i32_8(i32 %a) {
				; CHECK-LABEL: ashr_i32_8:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r19, r25
				; CHECK-NEXT: lsl r19
				; CHECK-NEXT: sbc r19, r19
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 8
				ret i32 %res
				}

				define i32 @ashr_i32_16(i32 %a) {
				; CHECK-LABEL: ashr_i32_16:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r22, r24
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: sbc r25, r25
				; CHECK-NEXT: mov r24, r25
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 16
				ret i32 %res
				}

				define i32 @ashr_i32_23(i32 %a) {
				; CHECK-LABEL: ashr_i32_23:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: sbc r19, r19
				; CHECK-NEXT: mov r18, r19
				; CHECK-NEXT: mov r23, r19
				; CHECK-NEXT: mov r22, r25
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 23
				ret i32 %res
				}

				define i32 @ashr_i32_30(i32 %a) {
				; CHECK-LABEL: ashr_i32_30:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: sbc r19, r19
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: mov r22, r19
				; CHECK-NEXT: rol r22
				; CHECK-NEXT: mov r18, r19
				; CHECK-NEXT: mov r23, r19
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 30
				ret i32 %res
				}

				define i32 @ashr_i32_31(i32 %a) {
				; CHECK-LABEL: ashr_i32_31:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: sbc r23, r23
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: movw r24, r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 31
				ret i32 %res
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Optimize constant 32-bit shiftsAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 477331

llvm/lib/Target/AVR/AVRISelLowering.h

llvm/lib/Target/AVR/AVRISelLowering.cpp

llvm/lib/Target/AVR/AVRInstrInfo.td

llvm/test/CodeGen/AVR/shift32.ll

[AVR] Optimize constant 32-bit shifts
AbandonedPublic