This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AVR/
-
Target/
-
AVR/
1
AVRISelLowering.h
16/30
AVRISelLowering.cpp
-
AVRInstrInfo.td
-
test/CodeGen/AVR/
-
CodeGen/
-
AVR/
-
avr-rust-issue-123.ll
-
return.ll
1/2
shift32.ll

Differential D138529

[AVR] Optimize constant 32-bit shifts
AbandonedPublic

Authored by aykevl on Nov 22 2022, 3:50 PM.

Download Raw Diff

Details

Reviewers

dylanmckay
benshi001

Summary

32-bit shift instructions were previously expanded using the default SelectionDAG expander, which meant it used 16-bit constant shifts and ORed them together. This works, but is far from optimal.

The patch here uses a custom instruction inserter to insert optimized constant 32-bit shifts. This is done using three new pseudo-instructions that take the upper and lower bits of the value in two separate 16-bit registers and outputs two 16-bit registers.

This change results in around 31% less instructions on average for constant 32-bit shifts, and is in all cases equal or better than the old behavior. It also tends to match or outperform avr-gcc: the only cases where avr-gcc does better is when it uses a loop to shift, or when the LLVM register allocator inserts some unnecessary movs. But it even outperforms avr-gcc in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9% smaller. I think picolibc is a better representation of real-world code, but even a ~1% reduction in code size is really significant.

I have tested this patch in simavr, with each of the 3 kinds of shifts and each of the 30 shift amounts that can be used.

Future patches can improve on this:

A loop can often result in reduced code size at the expense of speed. We should use a loop when the minsize attribute is set and a loop would need fewer instructions.
For some reason the register allocator inserts some unnecessary moves. I think this can be worked around by fiddling with REG_SEQUENCE, but I'm not sure. A better solution might be to try and split 16-bit instructions into 8-bit instructions before register allocation (something which I've tried a few times but haven't found a good solution for yet).
Because the main algorithm is independent of the number of registers, the same code can be used to emit moves for other shift widths (8, 16, 64, and others if needed). But this can be done in later patches if needed.

I mostly based myself on an investigation I did a while ago: https://aykevl.nl/2021/02/avr-bitshift

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aykevl created this revision.Nov 22 2022, 3:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 3:50 PM

Herald added subscribers: Jim, hiraditya. · View Herald Transcript

aykevl requested review of this revision.Nov 22 2022, 3:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 3:50 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aykevl edited the summary of this revision. (Show Details)Nov 22 2022, 3:53 PM

aykevl edited the summary of this revision. (Show Details)Nov 22 2022, 4:05 PM

Harbormaster completed remote builds in B199078: Diff 477331.Nov 22 2022, 4:37 PM

fixed two tests that were failing as a result of this change
some minor other changes

Harbormaster completed remote builds in B199094: Diff 477359.Nov 22 2022, 5:57 PM

aykevl mentioned this in D138582: [AVR] Do not use R0/R1 on avrtiny.Nov 23 2022, 8:43 AM

dnelson_1901 added a subscriber: dnelson_1901.Nov 25 2022, 9:38 AM

aykevl added a child revision: D138582: [AVR] Do not use R0/R1 on avrtiny.Nov 27 2022, 6:39 AM

rebase (no change in patch) - hopefully the pre-merge check now works?

Harbormaster completed remote builds in B199654: Diff 478103.Nov 27 2022, 8:52 AM

I will have a look next week.

For your commit message, I think the front part is OK, and the other part from Future patches can improve on this: till the end needs not to be committed.

benshi001 added inline comments.Dec 4 2022, 11:40 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
286	This message should be `Expected power-of-2 shift operand`, you can also fix it along with your patch..
llvm/lib/Target/AVR/AVRISelLowering.h
42	`LSLW` is a bit confusing with `LSLWN`, so how about `LSLWP`, which means `LSLW-Pair`.

benshi001 added inline comments.Dec 5 2022, 2:30 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1963	How about using `GetZeroRegister()` instead of fixed `AVR::R1`? This way should also fit avrtiny. Or we can emit `eor Rx, Rx` instead of `mov Rx, Zero` .

benshi001 added inline comments.Dec 5 2022, 2:33 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1866	How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero .
1919	How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero .
1990	How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero .

Is it possible to break this large patch to smaller ones, in which the first one is just the skeleton (with only shiftAmount = 1 implemented as the basic situation) ? Then later ones implment each special shiftAmount case .

benshi001 added inline comments.Dec 5 2022, 4:14 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
291	llvm_unreachable("Expected a constant shift amount!");
302	It is OK that `ShiftAmount` have a `MVT::i8` type ? Could it be better to `MVT::i16` ?
314	Would it better to use `std::map` or equivalent but more efficient llvm utilities?

use getZeroRegister instead of AVR::R1
use AVR::sub_lo and AVR::sub_hi instead of numeric constants
optimize REG_SEQUENCE

Thank you for taking a look! I will update the patch later with your suggestions. For now, I've updated the patch a bit with changes I made locally in the past few days.
(I will also update the patch with extra context that I forgot to add this time).

llvm/lib/Target/AVR/AVRISelLowering.cpp
302	32-bit shift instructions will shift at most 31 bits, so i8 is good enough.
314	Do you know of any? Other code seems to use a `switch` and I think this is very readable.
1866	This was already fixed in my local changes (this patch is older than D138582).

In D138529#3970417, @benshi001 wrote:

Is it possible to break this large patch to smaller ones, in which the first one is just the skeleton (with only shiftAmount = 1 implemented as the basic situation) ? Then later ones implment each special shiftAmount case .

I didn't split the patch because this way, there is no code size regression. But if you prefer I can split the patch in multiple patches and commit them all at the same time once accepted.

EDIT: yeah I agree, splitting this patch is better. I'll do that in the next update.

benshi001 added inline comments.Dec 5 2022, 6:43 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp

314

Though switch is more readable, it seems boring, in my opinion.

So I suggest

std::map<unsigned, unsigned> OpMap = {{ISD::SHL, AVRISD::LSLW},
                                      {ISD::SRL, AVRISD::LSRW}, 
                                      {ISD::SRA, AVRISD::ASRW}};
assert(OpMap.find(Op.getOpcode()) != OpMap.end() &&
       "Unexpected shift opcode");
unsigned Opc = OpMap[Op.getOpcode()];

which looks more clear.

benshi001 added inline comments.Dec 5 2022, 8:04 PM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2146	Have your supplemented tests covered all those special conditions ?

benshi001 added inline comments.Dec 5 2022, 11:45 PM

llvm/test/CodeGen/AVR/shift32.ll
2	Also check for AVRTiny ？

benshi001 added inline comments.Dec 6 2022, 12:17 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1843	I think using negative ShiftAmt for left shift is so counter-intuitive, it would be better to use one more argument `bool leftShift`, or direct expose the Opcode `{shl, lshr, ashr}`.

benshi001 added inline comments.Dec 6 2022, 12:24 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1861	Renaming `ShiftBytes` to `ShiftRegsSize` looks better.
1906	Renaming `ShiftBytes` to `ShiftRegsSize` looks better.

benshi001 added inline comments.Dec 6 2022, 12:36 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1855	Would it be better to split this large function `insertMultibyteShift` to small ones for each `if` statment ? for example, this `if` can be a standalone function `insertMultibyteShiftMod6_7` (or any better name).

benshi001 added inline comments.Dec 7 2022, 2:54 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1934	This `if` statement is not tested for ashr. It would be better to add an ashr6 or ashr30 test.

benshi001 added inline comments.Dec 7 2022, 2:58 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
1949	`Ext` and `ExtMore` seem confusing. May it be better that `Ext` -> `HighByte` `ExtMore` -> `ExtByte`

benshi001 added inline comments.Dec 7 2022, 3:07 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2008	So here should be an `assert((ShiftAmt > -8 && ShiftAmt < 8) && "Unexpect shift amount");`

benshi001 added inline comments.Dec 7 2022, 3:41 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2004	I am a bit confused about here. With `Regs = Regs.slice(1, Regs.size() - 1);`, so the 0 index element is dropped, and only 3 elements left in `Regs` ?

I created a series of patches to replace this one, for easier reviewing. Thank you for the thorough review!
See: D140569, D140570, D140571, D140572, D140573

llvm/lib/Target/AVR/AVRISelLowering.cpp
314	I'm not really convinced this is more readable. I like boring code, it means it is easy to understand.
1843	Hmm, you have a point here. But I kind of like using the signedness of the value. What do you think of renaming it to `ShiftRightAmt` instead? Then it's clearer that a negative number will be a left shift. I prefer to avoid opcodes because that would make the code less generic (I want to write a 64-bit shift later, for example).
1855	What would be the benefit of that?
1861	👍 sounds good
1934	It is, it is tested by `ashr_i32_30`. I have added a few more tests in my updated code.
1949	Fair enough. I'll update the code with these new names and some extra comments to explain what they do.
2004	Yes. I'll update the code to use `drop_front` and `drop_back` instead.
2008	👍 seems fine by me.
2146	Not sure, I did test all cases locally (all 93 cases) and this does not affect or improves code size in all cases. You can see a number of tests in D140573 that are optimized this way. In any case, this is an optimization. As long as correct code is generated in both cases, the condition is purely a heuristic.
llvm/test/CodeGen/AVR/shift32.ll
2	The only difference is the lack of `movw` (which isn't emitted by this patch, instead is uses `AVR::COPY` which is lowered to either `movw` or two `mov` instructions depending on support). All other instructions are supported by avrtiny. So I do not think testing for AVRTiny is necessary here. (I could add it of course if you think it's useful, but the test output will likely be near-identical).

aykevl mentioned this in D140569: [AVR] Custom lower 32-bit shift instructions.Dec 26 2022, 9:55 AM

You can close this patch.

Closing, this has been replaced by multiple smaller patches.

Revision Contents

Path

Size

llvm/

lib/

Target/

AVR/

AVRISelLowering.h

5 lines

AVRISelLowering.cpp

388 lines

AVRInstrInfo.td

18 lines

test/

CodeGen/

AVR/

avr-rust-issue-123.ll

6 lines

return.ll

8 lines

shift32.ll

472 lines

Diff 480081

llvm/lib/Target/AVR/AVRISelLowering.h

Context not available.
	LSLBN, ///< Byte logical shift left N bits.	LSLBN, ///< Byte logical shift left N bits.
	LSLWN, ///< Word logical shift left N bits.	LSLWN, ///< Word logical shift left N bits.
	LSLHI, ///< Higher 8-bit of word logical shift left.	LSLHI, ///< Higher 8-bit of word logical shift left.
		LSLW, ///< Wide logical shift left.
		benshi001Unsubmitted Not Done Reply Inline Actions `LSLW` is a bit confusing with `LSLWN`, so how about `LSLWP`, which means `LSLW-Pair`. benshi001: `LSLW` is a bit confusing with `LSLWN`, so how about `LSLWP`, which means `LSLW-Pair`.
	LSR, ///< Logical shift right.	LSR, ///< Logical shift right.
	LSRBN, ///< Byte logical shift right N bits.	LSRBN, ///< Byte logical shift right N bits.
	LSRWN, ///< Word logical shift right N bits.	LSRWN, ///< Word logical shift right N bits.
	LSRLO, ///< Lower 8-bit of word logical shift right.	LSRLO, ///< Lower 8-bit of word logical shift right.
		LSRW, ///< Wide logical shift right.
	ASR, ///< Arithmetic shift right.	ASR, ///< Arithmetic shift right.
	ASRBN, ///< Byte arithmetic shift right N bits.	ASRBN, ///< Byte arithmetic shift right N bits.
	ASRWN, ///< Word arithmetic shift right N bits.	ASRWN, ///< Word arithmetic shift right N bits.
	ASRLO, ///< Lower 8-bit of word arithmetic shift right.	ASRLO, ///< Lower 8-bit of word arithmetic shift right.
		ASRW, ///< Wide arithmetic shift right.
	ROR, ///< Bit rotate right.	ROR, ///< Bit rotate right.
	ROL, ///< Bit rotate left.	ROL, ///< Bit rotate left.
	LSLLOOP, ///< A loop of single logical shift left instructions.	LSLLOOP, ///< A loop of single logical shift left instructions.
Context not available.

	private:	private:
	MachineBasicBlock insertShift(MachineInstr &MI, MachineBasicBlock BB) const;	MachineBasicBlock insertShift(MachineInstr &MI, MachineBasicBlock BB) const;
		MachineBasicBlock *insertWideShift(MachineInstr &MI,
		MachineBasicBlock *BB) const;
	MachineBasicBlock insertMul(MachineInstr &MI, MachineBasicBlock BB) const;	MachineBasicBlock insertMul(MachineInstr &MI, MachineBasicBlock BB) const;
	MachineBasicBlock *insertCopyZero(MachineInstr &MI,	MachineBasicBlock *insertCopyZero(MachineInstr &MI,
	MachineBasicBlock *BB) const;	MachineBasicBlock *BB) const;
Context not available.

llvm/lib/Target/AVR/AVRISelLowering.cpp

Context not available.
	setOperationAction(ISD::SRA, MVT::i16, Custom);	setOperationAction(ISD::SRA, MVT::i16, Custom);
	setOperationAction(ISD::SHL, MVT::i16, Custom);	setOperationAction(ISD::SHL, MVT::i16, Custom);
	setOperationAction(ISD::SRL, MVT::i16, Custom);	setOperationAction(ISD::SRL, MVT::i16, Custom);
		setOperationAction(ISD::SRA, MVT::i32, Custom);
		setOperationAction(ISD::SHL, MVT::i32, Custom);
		setOperationAction(ISD::SRL, MVT::i32, Custom);
	setOperationAction(ISD::SHL_PARTS, MVT::i16, Expand);	setOperationAction(ISD::SHL_PARTS, MVT::i16, Expand);
	setOperationAction(ISD::SRA_PARTS, MVT::i16, Expand);	setOperationAction(ISD::SRA_PARTS, MVT::i16, Expand);
	setOperationAction(ISD::SRL_PARTS, MVT::i16, Expand);	setOperationAction(ISD::SRL_PARTS, MVT::i16, Expand);
Context not available.
	NODE(CALL);	NODE(CALL);
	NODE(WRAPPER);	NODE(WRAPPER);
	NODE(LSL);	NODE(LSL);
		NODE(LSLW);
	NODE(LSR);	NODE(LSR);
		NODE(LSRW);
	NODE(ROL);	NODE(ROL);
	NODE(ROR);	NODE(ROR);
	NODE(ASR);	NODE(ASR);
		NODE(ASRW);
	NODE(LSLLOOP);	NODE(LSLLOOP);
	NODE(LSRLOOP);	NODE(LSRLOOP);
	NODE(ROLLOOP);	NODE(ROLLOOP);
Context not available.
	assert(isPowerOf2_32(VT.getSizeInBits()) &&	assert(isPowerOf2_32(VT.getSizeInBits()) &&
	"Expected power-of-2 shift amount");	"Expected power-of-2 shift amount");
		benshi001Unsubmitted Not Done Reply Inline Actions This message should be `Expected power-of-2 shift operand`, you can also fix it along with your patch.. benshi001: This message should be `Expected power-of-2 shift operand`, you can also fix it along with your…

		if (VT.getSizeInBits() == 32) {
		if (!isa<ConstantSDNode>(N->getOperand(1))) {
		// 32-bit shifts are converted to a loop in IR.
		llvm_unreachable("Expected a constant shift!");
		benshi001Unsubmitted Not Done Reply Inline Actions llvm_unreachable("Expected a constant shift amount!"); benshi001: ``` llvm_unreachable("Expected a constant shift amount!"); ```
		}
		SDVTList ResTys = DAG.getVTList(MVT::i16, MVT::i16);
		SDValue SrcLo =
		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
		DAG.getConstant(0, dl, MVT::i16));
		SDValue SrcHi =
		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
		DAG.getConstant(1, dl, MVT::i16));
		uint64_t ShiftAmount =
		cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
		SDValue Cnt = DAG.getTargetConstant(ShiftAmount, dl, MVT::i8);
		benshi001Unsubmitted Not Done Reply Inline Actions It is OK that `ShiftAmount` have a `MVT::i8` type ? Could it be better to `MVT::i16` ? benshi001: It is OK that `ShiftAmount` have a `MVT::i8` type ? Could it be better to `MVT::i16` ?
		aykevlAuthorUnsubmitted Done Reply Inline Actions 32-bit shift instructions will shift at most 31 bits, so i8 is good enough. aykevl: 32-bit shift instructions will shift at most 31 bits, so i8 is good enough.
		unsigned Opc;
		switch (Op.getOpcode()) {
		default:
		llvm_unreachable("Invalid 32-bit shift opcode!");
		case ISD::SHL:
		Opc = AVRISD::LSLW;
		break;
		case ISD::SRL:
		Opc = AVRISD::LSRW;
		break;
		case ISD::SRA:
		Opc = AVRISD::ASRW;
		benshi001Unsubmitted Not Done Reply Inline Actions Would it better to use `std::map` or equivalent but more efficient llvm utilities? benshi001: Would it better to use `std::map` or equivalent but more efficient llvm utilities?
		aykevlAuthorUnsubmitted Done Reply Inline Actions Do you know of any? Other code seems to use a `switch` and I think this is very readable. aykevl: Do you know of any? Other code seems to use a `switch` and I think this is very readable.
		benshi001Unsubmitted Not Done Reply Inline Actions Though `switch` is more readable, it seems boring, in my opinion. So I suggest std::map<unsigned, unsigned> OpMap = {{ISD::SHL, AVRISD::LSLW}, {ISD::SRL, AVRISD::LSRW}, {ISD::SRA, AVRISD::ASRW}}; assert(OpMap.find(Op.getOpcode()) != OpMap.end() && "Unexpected shift opcode"); unsigned Opc = OpMap[Op.getOpcode()]; which looks more clear. benshi001: Though `switch` is more readable, it seems boring, in my opinion. So I suggest ``` std…
		aykevlAuthorUnsubmitted Done Reply Inline Actions I'm not really convinced this is more readable. I like boring code, it means it is easy to understand. aykevl: I'm not really convinced this is more readable. I like boring code, it means it is easy to…
		break;
		}
		SDValue Result = DAG.getNode(Opc, dl, ResTys, SrcLo, SrcHi, Cnt);
		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, Result.getValue(0),
		Result.getValue(1));
		}

	// Expand non-constant shifts to loops.	// Expand non-constant shifts to loops.
	if (!isa<ConstantSDNode>(N->getOperand(1))) {	if (!isa<ConstantSDNode>(N->getOperand(1))) {
	switch (Op.getOpcode()) {	switch (Op.getOpcode()) {
Context not available.
	return RemBB;	return RemBB;
	}	}

		// Do a multibyte AVR shift. Insert shift instructions and put the output
		// registers in the Regs array.
		// Because AVR does not have a normal shift instruction (only a single bit shift
		// instruction), we have to emulate this behavior with other instructions.
		// It first tries large steps (moving registers around) and then smaller steps
		// like single bit shifts.
		// Large shifts actually reduce the number of shifted registers, so the below
		// algorithms have to work independently of the number of registers that are
		// shifted.
		// For more information and background, see this blogpost:
		// https://aykevl.nl/2021/02/avr-bitshift
		static void insertMultibyteShift(MachineInstr &MI, MachineBasicBlock *BB,
		benshi001Unsubmitted Not Done Reply Inline Actions I think using negative ShiftAmt for left shift is so counter-intuitive, it would be better to use one more argument `bool leftShift`, or direct expose the Opcode `{shl, lshr, ashr}`. benshi001: I think using negative ShiftAmt for left shift is so counter-intuitive, it would be better to…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Hmm, you have a point here. But I kind of like using the signedness of the value. What do you think of renaming it to `ShiftRightAmt` instead? Then it's clearer that a negative number will be a left shift. I prefer to avoid opcodes because that would make the code less generic (I want to write a 64-bit shift later, for example). aykevl: Hmm, you have a point here. But I kind of like using the signedness of the value. What do you…
		MutableArrayRef<std::pair<Register, int>> Regs,
		int64_t ShiftAmt, bool ArithmeticShift) {
		const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
		const AVRSubtarget &STI = BB->getParent()->getSubtarget<AVRSubtarget>();
		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
		DebugLoc dl = MI.getDebugLoc();

		// Do a shift modulo 6 or 7. This is a bit more complicated than most shifts
		// and is hard to compose with the rest, so these are special cased.
		// The basic idea is to shift one or two bits in the opposite direction and
		// then move registers around to get the correct end result.
		if (ShiftAmt < 0 && (-ShiftAmt % 8) >= 6) {
		benshi001Unsubmitted Not Done Reply Inline Actions Would it be better to split this large function `insertMultibyteShift` to small ones for each `if` statment ? for example, this `if` can be a standalone function `insertMultibyteShiftMod6_7` (or any better name). benshi001: Would it be better to split this large function `insertMultibyteShift` to small ones for each…
		aykevlAuthorUnsubmitted Done Reply Inline Actions What would be the benefit of that? aykevl: What would be the benefit of that?
		// Left shift modulo 6 or 7.

		// Create a slice of the registers we're going to modify, to ease working
		// with them.
		size_t ShiftRegsOffset = -ShiftAmt / 8;
		size_t ShiftBytes = Regs.size() - ShiftRegsOffset;
		benshi001Unsubmitted Not Done Reply Inline Actions Renaming `ShiftBytes` to `ShiftRegsSize` looks better. benshi001: Renaming `ShiftBytes` to `ShiftRegsSize` looks better.
		aykevlAuthorUnsubmitted Done Reply Inline Actions 👍 sounds good aykevl: :+1: sounds good
		MutableArrayRef<std::pair<Register, int>> ShiftRegs =
		Regs.slice(ShiftRegsOffset, ShiftBytes);

		// Shift one to the right, keeping the least significant bit as the carry
		// bit.
		benshi001Unsubmitted Done Reply Inline Actions How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero . benshi001: How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny.
		aykevlAuthorUnsubmitted Done Reply Inline Actions This was already fixed in my local changes (this patch is older than D138582). aykevl: This was already fixed in my local changes (this patch is older than D138582).
		insertMultibyteShift(MI, BB, ShiftRegs, 1, false);

		// Create zero register.
		Register Zero = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), Zero)
		.addReg(STI.getZeroRegister());

		// Rotate the least significant bit from the carry bit into a new register
		// (that starts out zero).
		Register LowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), LowByte).addReg(Zero);

		// Shift one more to the right if this is a modulo-6 shift.
		if (-ShiftAmt % 8 == 6) {
		insertMultibyteShift(MI, BB, ShiftRegs, 1, false);
		Register NewLowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), NewLowByte).addReg(LowByte);
		LowByte = NewLowByte;
		}

		// Move all registers to the left, zeroing the bottom registers as needed.
		for (size_t I = 0; I < Regs.size(); I++) {
		int Idx = I + 1;
		if (Idx < (int)ShiftRegs.size()) {
		Regs[I] = ShiftRegs[Idx];
		} else if (Idx == (int)ShiftRegs.size()) {
		Regs[I] = std::pair(LowByte, 0);
		} else {
		Regs[I] = std::pair(Zero, 0);
		}
		}

		return;
		}

		// Right shift modulo 6 or 7.
		if (ShiftAmt > 0 && (ShiftAmt % 8) >= 6) {
		// Create a view on the registers we're going to modify, to ease working
		// with them.
		size_t ShiftBytes = Regs.size() - (ShiftAmt / 8);
		benshi001Unsubmitted Not Done Reply Inline Actions Renaming `ShiftBytes` to `ShiftRegsSize` looks better. benshi001: Renaming `ShiftBytes` to `ShiftRegsSize` looks better.
		MutableArrayRef<std::pair<Register, int>> ShiftRegs =
		Regs.slice(0, ShiftBytes);

		// Shift one to the left.
		insertMultibyteShift(MI, BB, ShiftRegs, -1, false);

		// Sign or zero extend the most significant register into a new register.
		Register Ext = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register ExtMore = 0;
		if (ArithmeticShift) {
		// Sign-extend bit that was shifted out last.
		BuildMI(*BB, MI, dl, TII.get(AVR::SBCRdRr), Ext)
		.addReg(Ext, RegState::Undef)
		benshi001Unsubmitted Done Reply Inline Actions How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero . benshi001: How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny.
		.addReg(Ext, RegState::Undef);
		ExtMore = Ext;
		} else {
		// Create a new zero register for zero extending.
		ExtMore = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), ExtMore)
		.addReg(STI.getZeroRegister());
		// Rotate most significant bit into a new register (that starts out zero).
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), Ext)
		.addReg(ExtMore)
		.addReg(ExtMore);
		}

		// Shift one more to the left for modulo 6 shifts.
		if (ShiftAmt % 8 == 6) {
		benshi001Unsubmitted Not Done Reply Inline Actions This `if` statement is not tested for ashr. It would be better to add an ashr6 or ashr30 test. benshi001: This `if` statement is not tested for ashr. It would be better to add an ashr6 or ashr30 test.
		aykevlAuthorUnsubmitted Done Reply Inline Actions It is, it is tested by `ashr_i32_30`. I have added a few more tests in my updated code. aykevl: It is, it is tested by `ashr_i32_30`. I have added a few more tests in my updated code.
		insertMultibyteShift(MI, BB, ShiftRegs, -1, false);
		Register NewExt = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), NewExt)
		.addReg(Ext)
		.addReg(Ext);
		Ext = NewExt;
		}

		// Move all to the right, while sign or zero extending.
		for (int I = Regs.size() - 1; I >= 0; I--) {
		int Idx = I - (Regs.size() - ShiftRegs.size()) - 1;
		if (Idx >= 0) {
		Regs[I] = ShiftRegs[Idx];
		} else if (Idx == -1) {
		Regs[I] = std::pair(Ext, 0);
		benshi001Unsubmitted Not Done Reply Inline Actions `Ext` and `ExtMore` seem confusing. May it be better that `Ext` -> `HighByte` `ExtMore` -> `ExtByte` benshi001: `Ext` and `ExtMore` seem confusing. May it be better that `Ext` -> `HighByte` `ExtMore` ->…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Fair enough. I'll update the code with these new names and some extra comments to explain what they do. aykevl: Fair enough. I'll update the code with these new names and some extra comments to explain what…
		} else {
		Regs[I] = std::pair(ExtMore, 0);
		}
		}

		return;
		}

		// For shift amounts of at least one register, simply rename the registers and
		// zero the bottom registers.
		auto MSBReg = Regs[0];
		Register ShrExtendReg = 0;
		while (ShiftAmt <= -8) {
		// Move all registers one to the left.
		benshi001Unsubmitted Done Reply Inline Actions How about using `GetZeroRegister()` instead of fixed `AVR::R1`? This way should also fit avrtiny. Or we can emit `eor Rx, Rx` instead of `mov Rx, Zero` . benshi001: How about using `GetZeroRegister()` instead of fixed `AVR::R1`? This way should also fit…
		for (size_t I = 0; I < Regs.size() - 1; I++) {
		Regs[I] = Regs[I + 1];
		}

		// Zero the least significant register.
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), Out).addReg(STI.getZeroRegister());
		Regs[Regs.size() - 1] = std::pair(Out, 0);

		// Continue shifts with the leftover registers.
		Regs = Regs.slice(0, Regs.size() - 1);

		ShiftAmt += 8;
		}
		while (ShiftAmt >= 8) {
		// Move all registers one to the right.
		for (size_t I = Regs.size() - 1; I != 0; I--) {
		Regs[I] = Regs[I - 1];
		}

		// Zero or sign extend the most significant register.
		if (ShrExtendReg == 0) {
		ShrExtendReg = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		if (ArithmeticShift) {
		// Sign extend the most significant register into ShrExtendReg.
		Register Tmp = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Tmp)
		benshi001Unsubmitted Done Reply Inline Actions How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny. Or we can emit eor Rx, Rx instead of mov Rx, Zero . benshi001: How about using GetZeroRegister() instead of fixed AVR::R1? This way should also fit avrtiny.
		.addReg(MSBReg.first, 0, MSBReg.second)
		.addReg(MSBReg.first, 0, MSBReg.second);
		BuildMI(*BB, MI, dl, TII.get(AVR::SBCRdRr), ShrExtendReg)
		.addReg(Tmp)
		.addReg(Tmp);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::COPY), ShrExtendReg)
		.addReg(STI.getZeroRegister());
		}
		}
		Regs[0] = std::pair(ShrExtendReg, 0);

		// Continue shifts with the leftover registers.
		Regs = Regs.slice(1, Regs.size() - 1);
		benshi001Unsubmitted Not Done Reply Inline Actions I am a bit confused about here. With `Regs = Regs.slice(1, Regs.size() - 1);`, so the 0 index element is dropped, and only 3 elements left in `Regs` ? benshi001: I am a bit confused about here. With `Regs = Regs.slice(1, Regs.size() - 1);`, so the 0 index…
		aykevlAuthorUnsubmitted Done Reply Inline Actions Yes. I'll update the code to use `drop_front` and `drop_back` instead. aykevl: Yes. I'll update the code to use `drop_front` and `drop_back` instead.

		ShiftAmt -= 8;
		}

		benshi001Unsubmitted Not Done Reply Inline Actions So here should be an `assert((ShiftAmt > -8 && ShiftAmt < 8) && "Unexpect shift amount");` benshi001: So here should be an `assert((ShiftAmt > -8 && ShiftAmt < 8) && "Unexpect shift amount");`
		aykevlAuthorUnsubmitted Done Reply Inline Actions 👍 seems fine by me. aykevl: :+1: seems fine by me.
		// Shift by four bits, using a complicated swap/eor/andi/eor sequence.
		// It only works for logical shifts because the bits shifted in are all
		// zeroes.
		// Example shifting 16 bits (2 bytes):
		//
		// ; shift r1
		// swap r1
		// andi r1, 0xf0
		// ; shift r0
		// swap r0
		// eor r1, r0
		// andi r0, 0xf0
		// eor r1, r0
		if (!ArithmeticShift && (ShiftAmt <= -4 \|\| ShiftAmt >= 4)) {
		Register Prev = 0;
		for (size_t i = 0; i < Regs.size(); i++) {
		size_t Idx = (ShiftAmt < 0) ? i : Regs.size() - i - 1;
		Register SwapReg = MRI.createVirtualRegister(&AVR::LD8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::SWAPRd), SwapReg)
		.addReg(Regs[Idx].first, 0, Regs[Idx].second);
		if (Prev != 0) {
		Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::EORRdRr), R)
		.addReg(Prev)
		.addReg(SwapReg);
		Prev = R;
		}
		Register AndReg = MRI.createVirtualRegister(&AVR::LD8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::ANDIRdK), AndReg)
		.addReg(SwapReg)
		.addImm((ShiftAmt < 0) ? 0xf0 : 0x0f);
		if (Prev != 0) {
		Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		BuildMI(*BB, MI, dl, TII.get(AVR::EORRdRr), R)
		.addReg(Prev)
		.addReg(AndReg);
		if (ShiftAmt < 0) { // left shift
		Regs[Idx - 1] = std::pair(R, 0);
		} else { // right shift
		Regs[Idx + 1] = std::pair(R, 0);
		}
		}
		Prev = AndReg;
		Regs[Idx] = std::pair(AndReg, 0);
		}
		if (ShiftAmt < 0) {
		ShiftAmt += 4;
		} else {
		ShiftAmt -= 4;
		}
		}

		// Shift by one. This is the fallback that always works, and the shift
		// operation that is used for 1, 2, and 3 bit shifts.
		while (ShiftAmt < 0) {
		// Shift one to the left.
		for (size_t i = 0; i < Regs.size(); i++) {
		size_t Idx = Regs.size() - i - 1;
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register In = Regs[Idx].first;
		Register InSubreg = Regs[Idx].second;
		if (i == 0) {
		BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Out)
		.addReg(In, 0, InSubreg)
		.addReg(In, 0, InSubreg);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), Out)
		.addReg(In, 0, InSubreg)
		.addReg(In, 0, InSubreg);
		}
		Regs[Idx] = std::pair(Out, 0);
		}
		ShiftAmt++;
		}
		while (ShiftAmt > 0) {
		// Shift one to the right.
		for (size_t i = 0; i < Regs.size(); i++) {
		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register In = Regs[i].first;
		Register InSubreg = Regs[i].second;
		if (i == 0) {
		unsigned Opc = ArithmeticShift ? AVR::ASRRd : AVR::LSRRd;
		BuildMI(*BB, MI, dl, TII.get(Opc), Out).addReg(In, 0, InSubreg);
		} else {
		BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), Out).addReg(In, 0, InSubreg);
		}
		Regs[i] = std::pair(Out, 0);
		}
		ShiftAmt--;
		}

		if (ShiftAmt != 0) {
		llvm_unreachable("don't know how to shift!"); // sanity check
		}
		}

		// Do a wide (32-bit) shift.
		MachineBasicBlock *
		AVRTargetLowering::insertWideShift(MachineInstr &MI,
		MachineBasicBlock *BB) const {
		const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
		DebugLoc dl = MI.getDebugLoc();

		// How much to shift to the right (meaning: a negative number indicates a left
		// shift).
		int64_t ShiftAmt = MI.getOperand(4).getImm();
		bool ArithmeticShift = false;
		switch (MI.getOpcode()) {
		case AVR::Lsl32:
		ShiftAmt = -ShiftAmt;
		break;
		case AVR::Asr32:
		ArithmeticShift = true;
		break;
		}

		// Read the input registers, with the most significant register at index 0.
		SmallVector<std::pair<Register, int>, 4> Registers;
		Registers.push_back(std::pair(MI.getOperand(3).getReg(), AVR::sub_hi));
		Registers.push_back(std::pair(MI.getOperand(3).getReg(), AVR::sub_lo));
		Registers.push_back(std::pair(MI.getOperand(2).getReg(), AVR::sub_hi));
		Registers.push_back(std::pair(MI.getOperand(2).getReg(), AVR::sub_lo));

		// Do the shift. The registers are modified in-place.
		insertMultibyteShift(MI, BB, Registers, ShiftAmt, ArithmeticShift);

		// Combine the 8-bit registers into 16-bit register pairs.
		// This done either from LSB to MSB or from MSB to LSB, depending on the
		// shift. It's an optimization so that the register allocator will use the
		// fewest movs possible (which order we use isn't a correctness issue, just an
		// optimization issue).
		// - lsl prefers starting from the most significant byte (2nd case).
		// - lshr prefers starting from the least significant byte (1st case).
		// - for ashr it depends on the number of shifted bytes.
		// Some shift operations still don't get the most optimal mov sequences even
		// with this distinction. TODO: figure out why and try to fix it (but we're
		// already equal to or faster than avr-gcc in all cases except ashr 8).
		if (ShiftAmt > 0 && (!ArithmeticShift \|\| (ShiftAmt < 16 \|\| ShiftAmt >= 22))) {
		benshi001Unsubmitted Not Done Reply Inline Actions Have your supplemented tests covered all those special conditions ? benshi001: Have your supplemented tests covered all those special conditions ?
		aykevlAuthorUnsubmitted Done Reply Inline Actions Not sure, I did test all cases locally (all 93 cases) and this does not affect or improves code size in all cases. You can see a number of tests in D140573 that are optimized this way. In any case, this is an optimization. As long as correct code is generated in both cases, the condition is purely a heuristic. aykevl: Not sure, I did test all cases locally (all 93 cases) and this does not affect or improves code…
		// Use the resulting registers starting with the least significant byte.
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())
		.addReg(Registers[3].first, 0, Registers[3].second)
		.addImm(AVR::sub_lo)
		.addReg(Registers[2].first, 0, Registers[2].second)
		.addImm(AVR::sub_hi);
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())
		.addReg(Registers[1].first, 0, Registers[1].second)
		.addImm(AVR::sub_lo)
		.addReg(Registers[0].first, 0, Registers[0].second)
		.addImm(AVR::sub_hi);
		} else {
		// Use the resulting registers starting with the most significant byte.
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())
		.addReg(Registers[0].first, 0, Registers[0].second)
		.addImm(AVR::sub_hi)
		.addReg(Registers[1].first, 0, Registers[1].second)
		.addImm(AVR::sub_lo);
		BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())
		.addReg(Registers[2].first, 0, Registers[2].second)
		.addImm(AVR::sub_hi)
		.addReg(Registers[3].first, 0, Registers[3].second)
		.addImm(AVR::sub_lo);
		}

		MI.eraseFromParent(); // The pseudo instruction is gone now.
		return BB;
		}

	static bool isCopyMulResult(MachineBasicBlock::iterator const &I) {	static bool isCopyMulResult(MachineBasicBlock::iterator const &I) {
	if (I->getOpcode() == AVR::COPY) {	if (I->getOpcode() == AVR::COPY) {
	Register SrcReg = I->getOperand(1).getReg();	Register SrcReg = I->getOperand(1).getReg();
Context not available.
	case AVR::Asr8:	case AVR::Asr8:
	case AVR::Asr16:	case AVR::Asr16:
	return insertShift(MI, MBB);	return insertShift(MI, MBB);
		case AVR::Lsl32:
		case AVR::Lsr32:
		case AVR::Asr32:
		return insertWideShift(MI, MBB);
	case AVR::MULRdRr:	case AVR::MULRdRr:
	case AVR::MULSRdRr:	case AVR::MULSRdRr:
	return insertMul(MI, MBB);	return insertMul(MI, MBB);
Context not available.

llvm/lib/Target/AVR/AVRInstrInfo.td

Context not available.
	def AVRlslwn : SDNode<"AVRISD::LSLWN", SDTIntBinOp>;	def AVRlslwn : SDNode<"AVRISD::LSLWN", SDTIntBinOp>;
	def AVRlsrwn : SDNode<"AVRISD::LSRWN", SDTIntBinOp>;	def AVRlsrwn : SDNode<"AVRISD::LSRWN", SDTIntBinOp>;
	def AVRasrwn : SDNode<"AVRISD::ASRWN", SDTIntBinOp>;	def AVRasrwn : SDNode<"AVRISD::ASRWN", SDTIntBinOp>;
		def AVRlslw : SDNode<"AVRISD::LSLW", SDTIntShiftDOp>;
		def AVRlsrw : SDNode<"AVRISD::LSRW", SDTIntShiftDOp>;
		def AVRasrw : SDNode<"AVRISD::ASRW", SDTIntShiftDOp>;

	// Pseudo shift nodes for non-constant shift amounts.	// Pseudo shift nodes for non-constant shift amounts.
	def AVRlslLoop : SDNode<"AVRISD::LSLLOOP", SDTIntShiftOp>;	def AVRlslLoop : SDNode<"AVRISD::LSLLOOP", SDTIntShiftOp>;
Context not available.
	: $src, i8	: $src, i8
	: $cnt))]>;	: $cnt))]>;

		def Lsl32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Lsl32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRlslw i16:$srclo, i16:$srchi, i8:$cnt))]>;

	def Lsr8 : ShiftPseudo<(outs GPR8	def Lsr8 : ShiftPseudo<(outs GPR8
	: $dst),	: $dst),
	(ins GPR8	(ins GPR8
Context not available.
	: $src, i8	: $src, i8
	: $cnt))]>;	: $cnt))]>;

		def Lsr32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Lsr32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRlsrw i16:$srclo, i16:$srchi, i8:$cnt))]>;

	def Rol8 : ShiftPseudo<(outs GPR8	def Rol8 : ShiftPseudo<(outs GPR8
	: $dst),	: $dst),
	(ins GPR8	(ins GPR8
Context not available.
	: $src, i8	: $src, i8
	: $cnt))]>;	: $cnt))]>;

		def Asr32 : ShiftPseudo<(outs DREGS:$dstlo, DREGS:$dsthi),
		(ins DREGS:$srclo, DREGS:$srchi, i8imm:$cnt),
		"# Asr32 PSEUDO",
		[(set i16:$dstlo, i16:$dsthi, (AVRasrw i16:$srclo, i16:$srchi, i8:$cnt))]>;

	// lowered to a copy from the zero register.	// lowered to a copy from the zero register.
	let usesCustomInserter=1 in	let usesCustomInserter=1 in
	def CopyZero : Pseudo<(outs GPR8:$rd), (ins), "clrz\t$rd", [(set i8:$rd, 0)]>;	def CopyZero : Pseudo<(outs GPR8:$rd), (ins), "clrz\t$rd", [(set i8:$rd, 0)]>;
Context not available.

llvm/test/CodeGen/AVR/avr-rust-issue-123.ll

Context not available.
	store i8 %tmp3, i8* getelementptr inbounds (%UInt8, %UInt8* @delayFactor, i64 0, i32 0), align 1	store i8 %tmp3, i8* getelementptr inbounds (%UInt8, %UInt8* @delayFactor, i64 0, i32 0), align 1
	%tmp4 = zext i8 %tmp3 to i32	%tmp4 = zext i8 %tmp3 to i32
	%tmp5 = mul nuw nsw i32 %tmp4, 100	%tmp5 = mul nuw nsw i32 %tmp4, 100
	; CHECK: sts delay+3, r{{[0-9]+}}	; CHECK: sts delay+1, r{{[0-9]+}}
	; CHECK-NEXT: sts delay+2, r{{[0-9]+}}
	; CHECK-NEXT: sts delay+1, r{{[0-9]+}}
	; CHECK-NEXT: sts delay, r{{[0-9]+}}	; CHECK-NEXT: sts delay, r{{[0-9]+}}
		; CHECK-NEXT: sts delay+3, r{{[0-9]+}}
		; CHECK-NEXT: sts delay+2, r{{[0-9]+}}
	store i32 %tmp5, i32* getelementptr inbounds (%UInt32, %UInt32* @delay, i64 0, i32 0), align 4	store i32 %tmp5, i32* getelementptr inbounds (%UInt32, %UInt32* @delay, i64 0, i32 0), align 4
	tail call void @eeprom_write(i16 34, i8 %tmp3)	tail call void @eeprom_write(i16 34, i8 %tmp3)
	br label %bb7	br label %bb7
Context not available.

llvm/test/CodeGen/AVR/return.ll

Context not available.
	; AVR-NEXT: push r29	; AVR-NEXT: push r29
	; AVR-NEXT: in r28, 61	; AVR-NEXT: in r28, 61
	; AVR-NEXT: in r29, 62	; AVR-NEXT: in r29, 62
	; AVR-NEXT: ldd r22, Y+5
	; AVR-NEXT: ldd r23, Y+6
	; AVR-NEXT: ldd r24, Y+7	; AVR-NEXT: ldd r24, Y+7
	; AVR-NEXT: ldd r25, Y+8	; AVR-NEXT: ldd r25, Y+8
		; AVR-NEXT: ldd r22, Y+5
		; AVR-NEXT: ldd r23, Y+6
	; AVR-NEXT: pop r29	; AVR-NEXT: pop r29
	; AVR-NEXT: pop r28	; AVR-NEXT: pop r28
	; AVR-NEXT: ret	; AVR-NEXT: ret
Context not available.
	; TINY-NEXT: push r29	; TINY-NEXT: push r29
	; TINY-NEXT: in r28, 61	; TINY-NEXT: in r28, 61
	; TINY-NEXT: in r29, 62	; TINY-NEXT: in r29, 62
	; TINY-NEXT: ldd r22, Y+13
	; TINY-NEXT: ldd r23, Y+14
	; TINY-NEXT: ldd r24, Y+15	; TINY-NEXT: ldd r24, Y+15
	; TINY-NEXT: ldd r25, Y+16	; TINY-NEXT: ldd r25, Y+16
		; TINY-NEXT: ldd r22, Y+13
		; TINY-NEXT: ldd r23, Y+14
	; TINY-NEXT: pop r29	; TINY-NEXT: pop r29
	; TINY-NEXT: pop r28	; TINY-NEXT: pop r28
	; TINY-NEXT: ret	; TINY-NEXT: ret
Context not available.

llvm/test/CodeGen/AVR/shift32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=avr -mattr=movw -verify-machineinstrs \| FileCheck %s
				benshi001Unsubmitted Not Done Reply Inline Actions Also check for AVRTiny ？ benshi001: Also check for AVRTiny ？
				aykevlAuthorUnsubmitted Done Reply Inline Actions The only difference is the lack of `movw` (which isn't emitted by this patch, instead is uses `AVR::COPY` which is lowered to either `movw` or two `mov` instructions depending on support). All other instructions are supported by avrtiny. So I do not think testing for AVRTiny is necessary here. (I could add it of course if you think it's useful, but the test output will likely be near-identical). aykevl: The only difference is the lack of `movw` (which isn't emitted by this patch, instead is uses…

				; Lowering of constant 32-bit shift instructions.
				; The main reason these functions are tested separate from shift.ll is because
				; of update_llc_test_checks.py.

				define i32 @shl_i32_1(i32 %a) {
				; CHECK-LABEL: shl_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 1
				ret i32 %res
				}

				define i32 @shl_i32_2(i32 %a) {
				; CHECK-LABEL: shl_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 2
				ret i32 %res
				}

				define i32 @shl_i32_4(i32 %a) {
				; CHECK-LABEL: shl_i32_4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r25
				; CHECK-NEXT: andi r25, 240
				; CHECK-NEXT: swap r24
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: andi r24, 240
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: swap r23
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: andi r23, 240
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: andi r22, 240
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: ret
				%res = shl i32 %a, 4
				ret i32 %res
				}

				define i32 @shl_i32_5(i32 %a) {
				; CHECK-LABEL: shl_i32_5:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r25
				; CHECK-NEXT: andi r25, 240
				; CHECK-NEXT: swap r24
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: andi r24, 240
				; CHECK-NEXT: eor r25, r24
				; CHECK-NEXT: swap r23
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: andi r23, 240
				; CHECK-NEXT: eor r24, r23
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: andi r22, 240
				; CHECK-NEXT: eor r23, r22
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 5
				ret i32 %res
				}

				define i32 @shl_i32_6(i32 %a) {
				; CHECK-LABEL: shl_i32_6:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: mov r18, r1
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: mov r25, r24
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r19, r22
				; CHECK-NEXT: movw r22, r18
				; CHECK-NEXT: ret
				%res = shl i32 %a, 6
				ret i32 %res
				}


				define i32 @shl_i32_7(i32 %a) {
				; CHECK-LABEL: shl_i32_7:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: mov r18, r1
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: mov r25, r24
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r19, r22
				; CHECK-NEXT: movw r22, r18
				; CHECK-NEXT: ret
				%res = shl i32 %a, 7
				ret i32 %res
				}

				define i32 @shl_i32_8(i32 %a) {
				; CHECK-LABEL: shl_i32_8:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r25, r24
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r23, r22
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: ret
				%res = shl i32 %a, 8
				ret i32 %res
				}

				define i32 @shl_i32_15(i32 %a) {
				; CHECK-LABEL: shl_i32_15:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r18, r22
				; CHECK-NEXT: lsr r24
				; CHECK-NEXT: ror r19
				; CHECK-NEXT: ror r18
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = shl i32 %a, 15
				ret i32 %res
				}

				; Combined with the register allocator, shift instructions can sometimes be
				; optimized away entirely. The least significant registers are simply stored
				; directly instead of moving them first.
				; TODO: the `mov Rd, r1` instructions are needed because most of the
				; instructions are 16-bits and instructions are only split after register
				; allocation. These two instructions could be avoided if the 16-bit store
				; instruction was split into two 8-bit store instructions before register
				; allocation. That would make this shift a no-op.
				define void @shl_i32_16_ptr(i32 %a, ptr %ptr) {
				; CHECK-LABEL: shl_i32_16_ptr:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: movw r30, r20
				; CHECK-NEXT: std Z+2, r22
				; CHECK-NEXT: std Z+3, r23
				; CHECK-NEXT: st Z, r24
				; CHECK-NEXT: std Z+1, r25
				; CHECK-NEXT: ret
				%res = shl i32 %a, 16
				store i32 %res, ptr %ptr
				ret void
				}

				define i32 @shl_i32_28(i32 %a) {
				; CHECK-LABEL: shl_i32_28:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: andi r22, 240
				; CHECK-NEXT: mov r25, r22
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: ret
				%res = shl i32 %a, 28
				ret i32 %res
				}

				define i32 @shl_i32_31(i32 %a) {
				; CHECK-LABEL: shl_i32_31:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r22
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: ror r25
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: ret
				%res = shl i32 %a, 31
				ret i32 %res
				}

				define i32 @lshr_i32_1(i32 %a) {
				; CHECK-LABEL: lshr_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 1
				ret i32 %res
				}

				define i32 @lshr_i32_2(i32 %a) {
				; CHECK-LABEL: lshr_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 2
				ret i32 %res
				}

				define i32 @lshr_i32_4(i32 %a) {
				; CHECK-LABEL: lshr_i32_4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: swap r22
				; CHECK-NEXT: andi r22, 15
				; CHECK-NEXT: swap r23
				; CHECK-NEXT: eor r22, r23
				; CHECK-NEXT: andi r23, 15
				; CHECK-NEXT: eor r22, r23
				; CHECK-NEXT: swap r24
				; CHECK-NEXT: eor r23, r24
				; CHECK-NEXT: andi r24, 15
				; CHECK-NEXT: eor r23, r24
				; CHECK-NEXT: swap r25
				; CHECK-NEXT: eor r24, r25
				; CHECK-NEXT: andi r25, 15
				; CHECK-NEXT: eor r24, r25
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 4
				ret i32 %res
				}

				define i32 @lshr_i32_6(i32 %a) {
				; CHECK-LABEL: lshr_i32_6:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: mov r19, r1
				; CHECK-NEXT: rol r19
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: rol r19
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 6
				ret i32 %res
				}

				define i32 @lshr_i32_7(i32 %a) {
				; CHECK-LABEL: lshr_i32_7:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: mov r19, r1
				; CHECK-NEXT: rol r19
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 7
				ret i32 %res
				}

				define i32 @lshr_i32_8(i32 %a) {
				; CHECK-LABEL: lshr_i32_8:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: mov r24, r25
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 8
				ret i32 %res
				}

				define i32 @lshr_i32_24(i32 %a) {
				; CHECK-LABEL: lshr_i32_24:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r22, r25
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 24
				ret i32 %res
				}

				define i32 @lshr_i32_31(i32 %a) {
				; CHECK-LABEL: lshr_i32_31:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: mov r22, r1
				; CHECK-NEXT: rol r22
				; CHECK-NEXT: mov r23, r1
				; CHECK-NEXT: mov r24, r1
				; CHECK-NEXT: mov r25, r1
				; CHECK-NEXT: ret
				%res = lshr i32 %a, 31
				ret i32 %res
				}

				define i32 @ashr_i32_1(i32 %a) {
				; CHECK-LABEL: ashr_i32_1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 1
				ret i32 %res
				}

				define i32 @ashr_i32_2(i32 %a) {
				; CHECK-LABEL: ashr_i32_2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 2
				ret i32 %res
				}

				; can't use the swap/andi/eor trick here
				define i32 @ashr_i32_4(i32 %a) {
				; CHECK-LABEL: ashr_i32_4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 4
				ret i32 %res
				}

				define i32 @ashr_i32_7(i32 %a) {
				; CHECK-LABEL: ashr_i32_7:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: sbc r19, r19
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 7
				ret i32 %res
				}

				; TODO: this could be optimized to 4 movs, instead of five.
				define i32 @ashr_i32_8(i32 %a) {
				; CHECK-LABEL: ashr_i32_8:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: mov r19, r25
				; CHECK-NEXT: lsl r19
				; CHECK-NEXT: sbc r19, r19
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: mov r23, r24
				; CHECK-NEXT: mov r18, r25
				; CHECK-NEXT: movw r24, r18
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 8
				ret i32 %res
				}

				define i32 @ashr_i32_16(i32 %a) {
				; CHECK-LABEL: ashr_i32_16:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: movw r22, r24
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: sbc r25, r25
				; CHECK-NEXT: mov r24, r25
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 16
				ret i32 %res
				}

				define i32 @ashr_i32_23(i32 %a) {
				; CHECK-LABEL: ashr_i32_23:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: sbc r23, r23
				; CHECK-NEXT: mov r22, r25
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r25, r23
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 23
				ret i32 %res
				}

				define i32 @ashr_i32_30(i32 %a) {
				; CHECK-LABEL: ashr_i32_30:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: sbc r23, r23
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: mov r22, r23
				; CHECK-NEXT: rol r22
				; CHECK-NEXT: mov r24, r23
				; CHECK-NEXT: mov r25, r23
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 30
				ret i32 %res
				}

				define i32 @ashr_i32_31(i32 %a) {
				; CHECK-LABEL: ashr_i32_31:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: lsl r25
				; CHECK-NEXT: sbc r22, r22
				; CHECK-NEXT: mov r23, r22
				; CHECK-NEXT: movw r24, r22
				; CHECK-NEXT: ret
				%res = ashr i32 %a, 31
				ret i32 %res
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Optimize constant 32-bit shiftsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 480081

llvm/lib/Target/AVR/AVRISelLowering.h

llvm/lib/Target/AVR/AVRISelLowering.cpp

llvm/lib/Target/AVR/AVRInstrInfo.td

llvm/test/CodeGen/AVR/avr-rust-issue-123.ll

llvm/test/CodeGen/AVR/return.ll

llvm/test/CodeGen/AVR/shift32.ll

[AVR] Optimize constant 32-bit shifts
AbandonedPublic