This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/docs/tools/
-
docs/
-
tools/
-
clang-formatted-files.txt
-
llvm/
-
lib/Target/AVR/
-
Target/
-
AVR/
-
AVR.h
-
AVRExpandPseudoInsts.cpp
3
AVRISelLowering.cpp
-
AVRShiftExpand.cpp
-
AVRTargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/AVR/
-
CodeGen/
-
AVR/
-
shift-expand.ll
-
shift-loop.ll
-
shift32.ll
-
utils/gn/secondary/llvm/lib/Target/AVR/
-
gn/
-
secondary/
-
llvm/
-
lib/
-
Target/
-
AVR/
-
BUILD.gn

Differential D153197

[AVR] Expand shifts during AVRISelLowering
AbandonedPublic

Authored by Patryk27 on Jun 17 2023, 4:30 AM.

Download Raw Diff

Details

Reviewers

benshi001

Summary

Some passes can introduce shifts after AVRShiftExpandPass has completed;
if this happens, we panic during isel because we assume such shifts must
have been already expanded before.

This commit integrates our shift-expansion pass with isel-selection pass
so that isel doesn't get surprised by shifts of non-constant amounts
anymore.

Spotted in the wild in rustc:

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Patryk27 created this revision.Jun 17 2023, 4:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2023, 4:30 AM

Herald added subscribers: Jim, JDevlieghere, hiraditya, dylanmckay. · View Herald Transcript

Patryk27 requested review of this revision.Jun 17 2023, 4:30 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 17 2023, 4:30 AM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Patryk27 mentioned this in D152059: [AVR] Replace shift-to-loop IR pass with common shift code.Jun 17 2023, 4:31 AM

Patryk27 added inline comments.Jun 17 2023, 4:33 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp

2193

Note that I've changed to code to re-arrange the generated blocks a bit, from:

body:             |
  bb.0 (%ir-block.0):
    successors: %bb.2(0x80000000)
    liveins: $r23r22, $r25r24, $r19r18
  
    %2:dregs = COPY $r19r18
    %1:dregs = COPY $r25r24
    %0:dregs = COPY $r23r22
    %4:gpr8 = COPY %2.sub_lo
    RJMPk %bb.2
  
  bb.1 (%ir-block.0):
    successors: %bb.2(0x80000000)
  
    %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg
    %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg
    %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg
    %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg
  
  bb.2 (%ir-block.0):
    successors: %bb.1(0x40000000), %bb.3(0x40000000)
  
    %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.1
    %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.1
    %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.1
    %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.1
    %16:gpr8 = PHI %4, %bb.0, %17, %bb.1
    %17:gpr8 = DECRd %16, implicit-def $sreg
    BRPLk %bb.1, implicit $sreg
  
  bb.3 (%ir-block.0):
    %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo
    %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo
    $r23r22 = COPY %5
    $r25r24 = COPY %6
    RET implicit $r23r22, implicit $r25r24, implicit $r1

... to:

body:             |
  bb.0 (%ir-block.0):
    successors: %bb.1(0x80000000)
    liveins: $r23r22, $r25r24, $r19r18
  
    %2:dregs = COPY $r19r18
    %1:dregs = COPY $r25r24
    %0:dregs = COPY $r23r22
    %4:gpr8 = COPY %2.sub_lo
    # fall-through instead of jumping
  
  bb.1 (%ir-block.0):
    successors: %bb.2(0x40000000), %bb.3(0x40000000)
  
    %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.2
    %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.2
    %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.2
    %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.2
    %16:gpr8 = PHI %4, %bb.0, %17, %bb.2
    %17:gpr8 = DECRd %16, implicit-def $sreg
    BRMIk %bb.3, implicit $sreg # <- reversed comparison + fallthrough
  
  bb.2 (%ir-block.0):
    successors: %bb.1(0x80000000)
  
    %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg
    %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg
    %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg
    %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg
    RJMPk %bb.1 # <- jump to the beginning
  
  bb.3 (%ir-block.0):
    %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo
    %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo
    $r23r22 = COPY %5
    $r25r24 = COPY %6
    RET implicit $r23r22, implicit $r25r24, implicit $r1

It looks like the generated assembly remained the same, I've also checked the actual binary through rustc + simavr.

Patryk27 added a reviewer: benshi001.Jun 17 2023, 4:33 AM

Harbormaster completed remote builds in B239594: Diff 532386.Jun 17 2023, 5:31 AM

Patryk27 added inline comments.Jun 18 2023, 4:25 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2207	Alright, this is wrong, after all - I've just tested it on a more elaborate code in rustc and `EntryBB->removeSuccessor(ExitBB);` triggers an LLVM panic (presumably because EntryBB == ExitBB). I kinda don't understand why doing something like this: MachineBasicBlock *ExitBB = EntryBB->splitAt(MI, false); if (EntryBB == ExitBB) { assert(EntryBB->canFallThrough() && "Expected a fallthrough block!"); ExitBB = EntryBB->getFallThrough(); } ... is not sufficient, though 👀

benshi001 added inline comments.Jun 20 2023, 1:47 AM

llvm/lib/Target/AVR/AVRISelLowering.cpp
2207	Is it possible to fix the 32-bit shift issue in moderate way? for example, keep the pass in `AVRShiftExpand.cpp`.

Now that I think about it, there might be a simpler way!

So, the underlying issue is that sometimes LLVM spawns new shifts during the instruction selection pass - for instance given this IR¹:

define i64 @test(i64 %x, i32 %y) {
start:
  %0 = or i32 %y, 38
  %1 = zext i32 %0 to i64
  %2 = lshr i64 %x, %1
  ret i64 %2
}

... this 64-bit shift will not get considered by our shift-expansion pass due to a condition here:

https://github.com/llvm/llvm-project/blob/eaaacc3c651e5b2c23bfa9648b6b0d69aab64d00/llvm/lib/Target/AVR/AVRShiftExpand.cpp#L54

... but later, during isel, LLVM will reduce this 64-bit shift into a 32-bit shift² and then ask us to expand this brand new 32-bit shift - that causes a panic since we expect those to have been already expanded.

In order to solve this issue, I think we could simply expand all variable-shifts greater than i8 (instead of expanding only 32-bit shifts):

diff --git a/llvm/lib/Target/AVR/AVRShiftExpand.cpp b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
index b7dcd860467d..4ea6d9fdb57c 100644
--- a/llvm/lib/Target/AVR/AVRShiftExpand.cpp
+++ b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
@@ -51,8 +51,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
     if (!I.isShift())
       // Only expand shift instructions (shl, lshr, ashr).
       continue;
-    if (I.getType() != Type::getInt32Ty(Ctx))
-      // Only expand plain i32 types.
+    if (I.getType() == Type::getInt8Ty(Ctx))
       continue;
     if (isa<ConstantInt>(I.getOperand(1)))
       // Only expand when the shift amount is not known.
@@ -75,7 +74,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
 void AVRShiftExpand::expand(BinaryOperator *BI) {
   auto &Ctx = BI->getContext();
   IRBuilder<> Builder(BI);
-  Type *Int32Ty = Type::getInt32Ty(Ctx);
+  Type *InputTy = cast<Instruction>(BI)->getType();
   Type *Int8Ty = Type::getInt8Ty(Ctx);
   Value *Int8Zero = ConstantInt::get(Int8Ty, 0);
 
@@ -101,7 +100,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
   Builder.SetInsertPoint(LoopBB);
   PHINode *ShiftAmountPHI = Builder.CreatePHI(Int8Ty, 2);
   ShiftAmountPHI->addIncoming(ShiftAmount, BB);
-  PHINode *ValuePHI = Builder.CreatePHI(Int32Ty, 2);
+  PHINode *ValuePHI = Builder.CreatePHI(InputTy, 2);
   ValuePHI->addIncoming(BI->getOperand(0), BB);
 
   // Subtract the shift amount by one, as we're shifting one this loop
@@ -116,13 +115,13 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
   Value *ValueShifted;
   switch (BI->getOpcode()) {
   case Instruction::Shl:
-    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(Int32Ty, 1));
+    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(InputTy, 1));
     break;
   case Instruction::LShr:
-    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
+    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(InputTy, 1));
     break;
   case Instruction::AShr:
-    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
+    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(InputTy, 1));
     break;
   default:
     llvm_unreachable("asked to expand an instruction that is not a shift");
@@ -137,7 +136,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
   // Collect the resulting value. This is necessary in the IR but won't produce
   // any actual instructions.
   Builder.SetInsertPoint(BI);
-  PHINode *Result = Builder.CreatePHI(Int32Ty, 2);
+  PHINode *Result = Builder.CreatePHI(InputTy, 2);
   Result->addIncoming(BI->getOperand(0), BB);
   Result->addIncoming(ValueShifted, LoopBB);

Overall, this solution seems to work - I've checked it on a couple of Rust applications and the binaries behave correctly, both some simple and more complex ones.

I think the only disadvantage of this approach, as compared to expanding shifts during isel, are optimizations: expanding shifts eagerly means that we'll lose some of the optimizations we could have applied otherwise.

For instance, following that first example from Rust's standard library, we will expand that instruction into a 64-bit shift even though in principle a 32-bit shift would suffice (but we don't know that yet during shift-expansion pass).

For safety, we could implement this simpler approach first, as presented in the diff here, and maybe come back to merging shift-expansion with isel in the future - what do you think about it?

¹ minimized case from Rust's standard library - originally: _ZN4core3num7dec2flt6lemire13compute_float17hc1d4de6247502c96E
² I'm not 100% sure why, though - seems to be somehow related to this or + zext combination

ping ping, @benshi001 👀

In D153197#4483230, @Patryk27 wrote:

ping ping, @benshi001 👀

Could you please upload your final version of this patch ? I see you have made some changes, but only mentioned in your comment.

Superseded by:

https://reviews.llvm.org/D154785

Revision Contents

Path

Size

clang/

docs/

tools/

clang-formatted-files.txt

1 line

llvm/

lib/

Target/

AVR/

AVR.h

2 lines

AVRExpandPseudoInsts.cpp

5 lines

205 lines

11 lines

1 line

test/

CodeGen/

AVR/

shift-expand.ll

shift-loop.ll

46 lines

shift32.ll

63 lines

utils/

gn/

secondary/

llvm/

lib/

Target/

AVR/

BUILD.gn

1 line

Diff 532386

clang/docs/tools/clang-formatted-files.txt

	Show First 20 Lines • Show All 6,378 Lines • ▼ Show 20 Lines
	llvm/lib/Target/AVR/AVRISelLowering.cpp			llvm/lib/Target/AVR/AVRISelLowering.cpp
	llvm/lib/Target/AVR/AVRISelLowering.h			llvm/lib/Target/AVR/AVRISelLowering.h
	llvm/lib/Target/AVR/AVRMachineFunctionInfo.h			llvm/lib/Target/AVR/AVRMachineFunctionInfo.h
	llvm/lib/Target/AVR/AVRMCInstLower.cpp			llvm/lib/Target/AVR/AVRMCInstLower.cpp
	llvm/lib/Target/AVR/AVRMCInstLower.h			llvm/lib/Target/AVR/AVRMCInstLower.h
	llvm/lib/Target/AVR/AVRRegisterInfo.cpp			llvm/lib/Target/AVR/AVRRegisterInfo.cpp
	llvm/lib/Target/AVR/AVRRegisterInfo.h			llvm/lib/Target/AVR/AVRRegisterInfo.h
	llvm/lib/Target/AVR/AVRSelectionDAGInfo.h			llvm/lib/Target/AVR/AVRSelectionDAGInfo.h
	llvm/lib/Target/AVR/AVRShiftExpand.cpp
	llvm/lib/Target/AVR/AVRSubtarget.cpp			llvm/lib/Target/AVR/AVRSubtarget.cpp
	llvm/lib/Target/AVR/AVRSubtarget.h			llvm/lib/Target/AVR/AVRSubtarget.h
	llvm/lib/Target/AVR/AVRTargetMachine.cpp			llvm/lib/Target/AVR/AVRTargetMachine.cpp
	llvm/lib/Target/AVR/AVRTargetMachine.h			llvm/lib/Target/AVR/AVRTargetMachine.h
	llvm/lib/Target/AVR/AVRTargetObjectFile.cpp			llvm/lib/Target/AVR/AVRTargetObjectFile.cpp
	llvm/lib/Target/AVR/AVRTargetObjectFile.h			llvm/lib/Target/AVR/AVRTargetObjectFile.h
	llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp			llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp
	llvm/lib/Target/AVR/Disassembler/AVRDisassembler.cpp			llvm/lib/Target/AVR/Disassembler/AVRDisassembler.cpp
	▲ Show 20 Lines • Show All 2,446 Lines • Show Last 20 Lines

llvm/lib/Target/AVR/AVR.h

	//===-- AVR.h - Top-level interface for AVR representation ------- C++ --===//			//===-- AVR.h - Top-level interface for AVR representation ------- C++ --===//
				Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	Show All 11 Lines
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"

	namespace llvm {			namespace llvm {

	class AVRTargetMachine;			class AVRTargetMachine;
	class FunctionPass;			class FunctionPass;
	class PassRegistry;			class PassRegistry;

	Pass *createAVRShiftExpandPass();
	FunctionPass *createAVRISelDag(AVRTargetMachine &TM,			FunctionPass *createAVRISelDag(AVRTargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createAVRExpandPseudoPass();			FunctionPass *createAVRExpandPseudoPass();
	FunctionPass *createAVRFrameAnalyzerPass();			FunctionPass *createAVRFrameAnalyzerPass();
	FunctionPass *createAVRBranchSelectionPass();			FunctionPass *createAVRBranchSelectionPass();

	void initializeAVRDAGToDAGISelPass(PassRegistry &);			void initializeAVRDAGToDAGISelPass(PassRegistry &);
	void initializeAVRExpandPseudoPass(PassRegistry &);			void initializeAVRExpandPseudoPass(PassRegistry &);
	void initializeAVRShiftExpandPass(PassRegistry &);

	/// Contains the AVR backend.			/// Contains the AVR backend.
	namespace AVR {			namespace AVR {

	/// An integer that identifies all of the supported AVR address spaces.			/// An integer that identifies all of the supported AVR address spaces.
	enum AddressSpace {			enum AddressSpace {
	DataMemory,			DataMemory,
	ProgramMemory,			ProgramMemory,
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp

//===-- AVRExpandPseudoInsts.cpp - Expand pseudo instructions -------------===//		//===-- AVRExpandPseudoInsts.cpp - Expand pseudo instructions -------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	bool AVRExpandPseudo::isLogicImmOpRedundant(unsigned Op,

// ORI Rd, 0x0 is redundant.		// ORI Rd, 0x0 is redundant.
if (Op == AVR::ORIRdK && ImmVal == 0x0)		if (Op == AVR::ORIRdK && ImmVal == 0x0)
return true;		return true;

return false;		return false;
}		}

		/// Returns whether given logic operation effectively does not depend its
		/// first argument.
		///
		/// For instrance, `reg & 0x00` will behave the same way regardless of the reg's
		/// value.
bool AVRExpandPseudo::isLogicRegOpUndef(unsigned Op, unsigned ImmVal) const {		bool AVRExpandPseudo::isLogicRegOpUndef(unsigned Op, unsigned ImmVal) const {
// ANDI Rd, 0x00 clears all input bits.		// ANDI Rd, 0x00 clears all input bits.
if (Op == AVR::ANDIRdK && ImmVal == 0x00)		if (Op == AVR::ANDIRdK && ImmVal == 0x00)
return true;		return true;

// ORI Rd, 0xff sets all input bits.		// ORI Rd, 0xff sets all input bits.
if (Op == AVR::ORIRdK && ImmVal == 0xff)		if (Op == AVR::ORIRdK && ImmVal == 0xff)
return true;		return true;
▲ Show 20 Lines • Show All 2,411 Lines • Show Last 20 Lines

llvm/lib/Target/AVR/AVRISelLowering.cpp

//===-- AVRISelLowering.cpp - AVR DAG Lowering Implementation -------------===//		//===-- AVRISelLowering.cpp - AVR DAG Lowering Implementation -------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	SDValue AVRTargetLowering::LowerShifts(SDValue Op, SelectionDAG &DAG) const {
unsigned Opc8;		unsigned Opc8;
const SDNode *N = Op.getNode();		const SDNode *N = Op.getNode();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc dl(N);		SDLoc dl(N);
assert(llvm::has_single_bit<uint32_t>(VT.getSizeInBits()) &&		assert(llvm::has_single_bit<uint32_t>(VT.getSizeInBits()) &&
"Expected power-of-2 shift amount");		"Expected power-of-2 shift amount");

if (VT.getSizeInBits() == 32) {		if (VT.getSizeInBits() == 32) {
if (!isa<ConstantSDNode>(N->getOperand(1))) {
// 32-bit shifts are converted to a loop in IR.
// This should be unreachable.
report_fatal_error("Expected a constant shift amount!");
}
SDVTList ResTys = DAG.getVTList(MVT::i16, MVT::i16);		SDVTList ResTys = DAG.getVTList(MVT::i16, MVT::i16);
SDValue SrcLo =		SDValue SrcLo =
DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
DAG.getConstant(0, dl, MVT::i16));		DAG.getConstant(0, dl, MVT::i16));
SDValue SrcHi =		SDValue SrcHi =
DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),		DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i16, Op.getOperand(0),
DAG.getConstant(1, dl, MVT::i16));		DAG.getConstant(1, dl, MVT::i16));
		SDValue Cnt;
		if (isa<ConstantSDNode>(N->getOperand(1))) {
		// The amount to shift is known at compile time, so we can create an
		// optimized sequence of instructions to shift this value.
uint64_t ShiftAmount =		uint64_t ShiftAmount =
cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();		cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
if (ShiftAmount == 16) {		if (ShiftAmount == 16) {
// Special case these two operations because they appear to be used by the		// Special case these two operations because they appear to be used by
// generic codegen parts to lower 32-bit numbers.		// the generic codegen parts to lower 32-bit numbers.
// TODO: perhaps we can lower shift amounts bigger than 16 to a 16-bit		// TODO: perhaps we can lower shift amounts bigger than 16 to a 16-bit
// shift of a part of the 32-bit value?		// shift of a part of the 32-bit value?
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::SHL: {		case ISD::SHL: {
SDValue Zero = DAG.getConstant(0, dl, MVT::i16);		SDValue Zero = DAG.getConstant(0, dl, MVT::i16);
return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, Zero, SrcLo);		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, Zero, SrcLo);
}		}
case ISD::SRL: {		case ISD::SRL: {
SDValue Zero = DAG.getConstant(0, dl, MVT::i16);		SDValue Zero = DAG.getConstant(0, dl, MVT::i16);
return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, SrcHi, Zero);		return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i32, SrcHi, Zero);
}		}
}		}
}		}
SDValue Cnt = DAG.getTargetConstant(ShiftAmount, dl, MVT::i8);		Cnt = DAG.getTargetConstant(ShiftAmount, dl, MVT::i8);
		} else {
		// The shift is not known at compile time, so we have to emit this as a
		// loop.
		Cnt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i8, Op.getOperand(1));
		}
unsigned Opc;		unsigned Opc;
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
llvm_unreachable("Invalid 32-bit shift opcode!");		llvm_unreachable("Invalid 32-bit shift opcode!");
case ISD::SHL:		case ISD::SHL:
Opc = AVRISD::LSLW;		Opc = AVRISD::LSLW;
break;		break;
case ISD::SRL:		case ISD::SRL:
▲ Show 20 Lines • Show All 1,584 Lines • ▼ Show 20 Lines
// instruction), we have to emulate this behavior with other instructions.		// instruction), we have to emulate this behavior with other instructions.
// It first tries large steps (moving registers around) and then smaller steps		// It first tries large steps (moving registers around) and then smaller steps
// like single bit shifts.		// like single bit shifts.
// Large shifts actually reduce the number of shifted registers, so the below		// Large shifts actually reduce the number of shifted registers, so the below
// algorithms have to work independently of the number of registers that are		// algorithms have to work independently of the number of registers that are
// shifted.		// shifted.
// For more information and background, see this blogpost:		// For more information and background, see this blogpost:
// https://aykevl.nl/2021/02/avr-bitshift		// https://aykevl.nl/2021/02/avr-bitshift
static void insertMultibyteShift(MachineInstr &MI, MachineBasicBlock *BB,		static void insertMultibyteShift(MachineBasicBlock::iterator MBBI,
		MachineBasicBlock *BB, const DebugLoc &DL,
MutableArrayRef<std::pair<Register, int>> Regs,		MutableArrayRef<std::pair<Register, int>> Regs,
ISD::NodeType Opc, int64_t ShiftAmt) {		ISD::NodeType Opc, int64_t ShiftAmt) {
const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
const AVRSubtarget &STI = BB->getParent()->getSubtarget<AVRSubtarget>();		const AVRSubtarget &STI = BB->getParent()->getSubtarget<AVRSubtarget>();
MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
const DebugLoc &dl = MI.getDebugLoc();

const bool ShiftLeft = Opc == ISD::SHL;		const bool ShiftLeft = Opc == ISD::SHL;
const bool ArithmeticShift = Opc == ISD::SRA;		const bool ArithmeticShift = Opc == ISD::SRA;

// Zero a register, for use in later operations.		// Zero a register, for use in later operations.
Register ZeroReg = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register ZeroReg = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::COPY), ZeroReg)		BuildMI(*BB, MBBI, DL, TII.get(AVR::COPY), ZeroReg)
.addReg(STI.getZeroRegister());		.addReg(STI.getZeroRegister());

// Do a shift modulo 6 or 7. This is a bit more complicated than most shifts		// Do a shift modulo 6 or 7. This is a bit more complicated than most shifts
// and is hard to compose with the rest, so these are special cased.		// and is hard to compose with the rest, so these are special cased.
// The basic idea is to shift one or two bits in the opposite direction and		// The basic idea is to shift one or two bits in the opposite direction and
// then move registers around to get the correct end result.		// then move registers around to get the correct end result.
if (ShiftLeft && (ShiftAmt % 8) >= 6) {		if (ShiftLeft && (ShiftAmt % 8) >= 6) {
// Left shift modulo 6 or 7.		// Left shift modulo 6 or 7.

// Create a slice of the registers we're going to modify, to ease working		// Create a slice of the registers we're going to modify, to ease working
// with them.		// with them.
size_t ShiftRegsOffset = ShiftAmt / 8;		size_t ShiftRegsOffset = ShiftAmt / 8;
size_t ShiftRegsSize = Regs.size() - ShiftRegsOffset;		size_t ShiftRegsSize = Regs.size() - ShiftRegsOffset;
MutableArrayRef<std::pair<Register, int>> ShiftRegs =		MutableArrayRef<std::pair<Register, int>> ShiftRegs =
Regs.slice(ShiftRegsOffset, ShiftRegsSize);		Regs.slice(ShiftRegsOffset, ShiftRegsSize);

// Shift one to the right, keeping the least significant bit as the carry		// Shift one to the right, keeping the least significant bit as the carry
// bit.		// bit.
insertMultibyteShift(MI, BB, ShiftRegs, ISD::SRL, 1);		insertMultibyteShift(MBBI, BB, DL, ShiftRegs, ISD::SRL, 1);

// Rotate the least significant bit from the carry bit into a new register		// Rotate the least significant bit from the carry bit into a new register
// (that starts out zero).		// (that starts out zero).
Register LowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register LowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), LowByte).addReg(ZeroReg);		BuildMI(*BB, MBBI, DL, TII.get(AVR::RORRd), LowByte).addReg(ZeroReg);

// Shift one more to the right if this is a modulo-6 shift.		// Shift one more to the right if this is a modulo-6 shift.
if (ShiftAmt % 8 == 6) {		if (ShiftAmt % 8 == 6) {
insertMultibyteShift(MI, BB, ShiftRegs, ISD::SRL, 1);		insertMultibyteShift(MBBI, BB, DL, ShiftRegs, ISD::SRL, 1);
Register NewLowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register NewLowByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), NewLowByte).addReg(LowByte);		BuildMI(*BB, MBBI, DL, TII.get(AVR::RORRd), NewLowByte).addReg(LowByte);
LowByte = NewLowByte;		LowByte = NewLowByte;
}		}

// Move all registers to the left, zeroing the bottom registers as needed.		// Move all registers to the left, zeroing the bottom registers as needed.
for (size_t I = 0; I < Regs.size(); I++) {		for (size_t I = 0; I < Regs.size(); I++) {
int ShiftRegsIdx = I + 1;		int ShiftRegsIdx = I + 1;
if (ShiftRegsIdx < (int)ShiftRegs.size()) {		if (ShiftRegsIdx < (int)ShiftRegs.size()) {
Regs[I] = ShiftRegs[ShiftRegsIdx];		Regs[I] = ShiftRegs[ShiftRegsIdx];
Show All 11 Lines	static void insertMultibyteShift(MachineBasicBlock::iterator MBBI,
if (!ShiftLeft && (ShiftAmt % 8) >= 6) {		if (!ShiftLeft && (ShiftAmt % 8) >= 6) {
// Create a view on the registers we're going to modify, to ease working		// Create a view on the registers we're going to modify, to ease working
// with them.		// with them.
size_t ShiftRegsSize = Regs.size() - (ShiftAmt / 8);		size_t ShiftRegsSize = Regs.size() - (ShiftAmt / 8);
MutableArrayRef<std::pair<Register, int>> ShiftRegs =		MutableArrayRef<std::pair<Register, int>> ShiftRegs =
Regs.slice(0, ShiftRegsSize);		Regs.slice(0, ShiftRegsSize);

// Shift one to the left.		// Shift one to the left.
insertMultibyteShift(MI, BB, ShiftRegs, ISD::SHL, 1);		insertMultibyteShift(MBBI, BB, DL, ShiftRegs, ISD::SHL, 1);

// Sign or zero extend the most significant register into a new register.		// Sign or zero extend the most significant register into a new register.
// The HighByte is the byte that still has one (or two) bits from the		// The HighByte is the byte that still has one (or two) bits from the
// original value. The ExtByte is purely a zero/sign extend byte (all bits		// original value. The ExtByte is purely a zero/sign extend byte (all bits
// are either 0 or 1).		// are either 0 or 1).
Register HighByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register HighByte = MRI.createVirtualRegister(&AVR::GPR8RegClass);
Register ExtByte = 0;		Register ExtByte = 0;
if (ArithmeticShift) {		if (ArithmeticShift) {
// Sign-extend bit that was shifted out last.		// Sign-extend bit that was shifted out last.
BuildMI(*BB, MI, dl, TII.get(AVR::SBCRdRr), HighByte)		BuildMI(*BB, MBBI, DL, TII.get(AVR::SBCRdRr), HighByte)
.addReg(HighByte, RegState::Undef)		.addReg(HighByte, RegState::Undef)
.addReg(HighByte, RegState::Undef);		.addReg(HighByte, RegState::Undef);
ExtByte = HighByte;		ExtByte = HighByte;
// The highest bit of the original value is the same as the zero-extend		// The highest bit of the original value is the same as the zero-extend
// byte, so HighByte and ExtByte are the same.		// byte, so HighByte and ExtByte are the same.
} else {		} else {
// Use the zero register for zero extending.		// Use the zero register for zero extending.
ExtByte = ZeroReg;		ExtByte = ZeroReg;
// Rotate most significant bit into a new register (that starts out zero).		// Rotate most significant bit into a new register (that starts out zero).
BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), HighByte)		BuildMI(*BB, MBBI, DL, TII.get(AVR::ADCRdRr), HighByte)
.addReg(ExtByte)		.addReg(ExtByte)
.addReg(ExtByte);		.addReg(ExtByte);
}		}

// Shift one more to the left for modulo 6 shifts.		// Shift one more to the left for modulo 6 shifts.
if (ShiftAmt % 8 == 6) {		if (ShiftAmt % 8 == 6) {
insertMultibyteShift(MI, BB, ShiftRegs, ISD::SHL, 1);		insertMultibyteShift(MBBI, BB, DL, ShiftRegs, ISD::SHL, 1);
// Shift the topmost bit into the HighByte.		// Shift the topmost bit into the HighByte.
Register NewExt = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register NewExt = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), NewExt)		BuildMI(*BB, MBBI, DL, TII.get(AVR::ADCRdRr), NewExt)
.addReg(HighByte)		.addReg(HighByte)
.addReg(HighByte);		.addReg(HighByte);
HighByte = NewExt;		HighByte = NewExt;
}		}

// Move all to the right, while sign or zero extending.		// Move all to the right, while sign or zero extending.
for (int I = Regs.size() - 1; I >= 0; I--) {		for (int I = Regs.size() - 1; I >= 0; I--) {
int ShiftRegsIdx = I - (Regs.size() - ShiftRegs.size()) - 1;		int ShiftRegsIdx = I - (Regs.size() - ShiftRegs.size()) - 1;
Show All 28 Lines	static void insertMultibyteShift(MachineBasicBlock::iterator MBBI,

// And again, the same for right shifts.		// And again, the same for right shifts.
Register ShrExtendReg = 0;		Register ShrExtendReg = 0;
if (!ShiftLeft && ShiftAmt >= 8) {		if (!ShiftLeft && ShiftAmt >= 8) {
if (ArithmeticShift) {		if (ArithmeticShift) {
// Sign extend the most significant register into ShrExtendReg.		// Sign extend the most significant register into ShrExtendReg.
ShrExtendReg = MRI.createVirtualRegister(&AVR::GPR8RegClass);		ShrExtendReg = MRI.createVirtualRegister(&AVR::GPR8RegClass);
Register Tmp = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register Tmp = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Tmp)		BuildMI(*BB, MBBI, DL, TII.get(AVR::ADDRdRr), Tmp)
.addReg(Regs[0].first, 0, Regs[0].second)		.addReg(Regs[0].first, 0, Regs[0].second)
.addReg(Regs[0].first, 0, Regs[0].second);		.addReg(Regs[0].first, 0, Regs[0].second);
BuildMI(*BB, MI, dl, TII.get(AVR::SBCRdRr), ShrExtendReg)		BuildMI(*BB, MBBI, DL, TII.get(AVR::SBCRdRr), ShrExtendReg)
.addReg(Tmp)		.addReg(Tmp)
.addReg(Tmp);		.addReg(Tmp);
} else {		} else {
ShrExtendReg = ZeroReg;		ShrExtendReg = ZeroReg;
}		}
for (; ShiftAmt >= 8; ShiftAmt -= 8) {		for (; ShiftAmt >= 8; ShiftAmt -= 8) {
// Move all registers one to the right.		// Move all registers one to the right.
for (size_t I = Regs.size() - 1; I != 0; I--) {		for (size_t I = Regs.size() - 1; I != 0; I--) {
Show All 29 Lines	static void insertMultibyteShift(MachineBasicBlock::iterator MBBI,
// eor r1, r2		// eor r1, r2
// andi r2, 0x0f		// andi r2, 0x0f
// eor r1, r2		// eor r1, r2
if (!ArithmeticShift && ShiftAmt >= 4) {		if (!ArithmeticShift && ShiftAmt >= 4) {
Register Prev = 0;		Register Prev = 0;
for (size_t I = 0; I < Regs.size(); I++) {		for (size_t I = 0; I < Regs.size(); I++) {
size_t Idx = ShiftLeft ? I : Regs.size() - I - 1;		size_t Idx = ShiftLeft ? I : Regs.size() - I - 1;
Register SwapReg = MRI.createVirtualRegister(&AVR::LD8RegClass);		Register SwapReg = MRI.createVirtualRegister(&AVR::LD8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::SWAPRd), SwapReg)		BuildMI(*BB, MBBI, DL, TII.get(AVR::SWAPRd), SwapReg)
.addReg(Regs[Idx].first, 0, Regs[Idx].second);		.addReg(Regs[Idx].first, 0, Regs[Idx].second);
if (I != 0) {		if (I != 0) {
Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::EORRdRr), R)		BuildMI(*BB, MBBI, DL, TII.get(AVR::EORRdRr), R)
.addReg(Prev)		.addReg(Prev)
.addReg(SwapReg);		.addReg(SwapReg);
Prev = R;		Prev = R;
}		}
Register AndReg = MRI.createVirtualRegister(&AVR::LD8RegClass);		Register AndReg = MRI.createVirtualRegister(&AVR::LD8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::ANDIRdK), AndReg)		BuildMI(*BB, MBBI, DL, TII.get(AVR::ANDIRdK), AndReg)
.addReg(SwapReg)		.addReg(SwapReg)
.addImm(ShiftLeft ? 0xf0 : 0x0f);		.addImm(ShiftLeft ? 0xf0 : 0x0f);
if (I != 0) {		if (I != 0) {
Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register R = MRI.createVirtualRegister(&AVR::GPR8RegClass);
BuildMI(*BB, MI, dl, TII.get(AVR::EORRdRr), R)		BuildMI(*BB, MBBI, DL, TII.get(AVR::EORRdRr), R)
.addReg(Prev)		.addReg(Prev)
.addReg(AndReg);		.addReg(AndReg);
size_t PrevIdx = ShiftLeft ? Idx - 1 : Idx + 1;		size_t PrevIdx = ShiftLeft ? Idx - 1 : Idx + 1;
Regs[PrevIdx] = std::pair(R, 0);		Regs[PrevIdx] = std::pair(R, 0);
}		}
Prev = AndReg;		Prev = AndReg;
Regs[Idx] = std::pair(AndReg, 0);		Regs[Idx] = std::pair(AndReg, 0);
}		}
ShiftAmt -= 4;		ShiftAmt -= 4;
}		}

// Shift by one. This is the fallback that always works, and the shift		// Shift by one. This is the fallback that always works, and the shift
// operation that is used for 1, 2, and 3 bit shifts.		// operation that is used for 1, 2, and 3 bit shifts.
while (ShiftLeft && ShiftAmt) {		while (ShiftLeft && ShiftAmt) {
// Shift one to the left.		// Shift one to the left.
for (ssize_t I = Regs.size() - 1; I >= 0; I--) {		for (ssize_t I = Regs.size() - 1; I >= 0; I--) {
Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
Register In = Regs[I].first;		Register In = Regs[I].first;
Register InSubreg = Regs[I].second;		Register InSubreg = Regs[I].second;
if (I == (ssize_t)Regs.size() - 1) { // first iteration		if (I == (ssize_t)Regs.size() - 1) { // first iteration
BuildMI(*BB, MI, dl, TII.get(AVR::ADDRdRr), Out)		BuildMI(*BB, MBBI, DL, TII.get(AVR::ADDRdRr), Out)
.addReg(In, 0, InSubreg)		.addReg(In, 0, InSubreg)
.addReg(In, 0, InSubreg);		.addReg(In, 0, InSubreg);
} else {		} else {
BuildMI(*BB, MI, dl, TII.get(AVR::ADCRdRr), Out)		BuildMI(*BB, MBBI, DL, TII.get(AVR::ADCRdRr), Out)
.addReg(In, 0, InSubreg)		.addReg(In, 0, InSubreg)
.addReg(In, 0, InSubreg);		.addReg(In, 0, InSubreg);
}		}
Regs[I] = std::pair(Out, 0);		Regs[I] = std::pair(Out, 0);
}		}
ShiftAmt--;		ShiftAmt--;
}		}
while (!ShiftLeft && ShiftAmt) {		while (!ShiftLeft && ShiftAmt) {
// Shift one to the right.		// Shift one to the right.
for (size_t I = 0; I < Regs.size(); I++) {		for (size_t I = 0; I < Regs.size(); I++) {
Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);		Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
Register In = Regs[I].first;		Register In = Regs[I].first;
Register InSubreg = Regs[I].second;		Register InSubreg = Regs[I].second;
if (I == 0) {		if (I == 0) {
unsigned Opc = ArithmeticShift ? AVR::ASRRd : AVR::LSRRd;		unsigned Opc = ArithmeticShift ? AVR::ASRRd : AVR::LSRRd;
BuildMI(*BB, MI, dl, TII.get(Opc), Out).addReg(In, 0, InSubreg);		BuildMI(*BB, MBBI, DL, TII.get(Opc), Out).addReg(In, 0, InSubreg);
} else {		} else {
BuildMI(*BB, MI, dl, TII.get(AVR::RORRd), Out).addReg(In, 0, InSubreg);		BuildMI(*BB, MBBI, DL, TII.get(AVR::RORRd), Out)
		.addReg(In, 0, InSubreg);
}		}
Regs[I] = std::pair(Out, 0);		Regs[I] = std::pair(Out, 0);
}		}
ShiftAmt--;		ShiftAmt--;
}		}

if (ShiftAmt != 0) {		if (ShiftAmt != 0) {
llvm_unreachable("don't know how to shift!"); // sanity check		llvm_unreachable("don't know how to shift!"); // sanity check
}		}
}		}

		// Do a multibyte shift by shifting one bit at a time in a loop. It works very
		// similar to insertMultibyteShift in that it modifies the Regs array in-place
		// (the output registers are stored in this array on return).
		static MachineBasicBlock *insertMultibyteShiftLoop(
		Patryk27AuthorUnsubmitted Not Done Reply Inline Actions Note that I've changed to code to re-arrange the generated blocks a bit, from: body: \| bb.0 (%ir-block.0): successors: %bb.2(0x80000000) liveins: $r23r22, $r25r24, $r19r18 %2:dregs = COPY $r19r18 %1:dregs = COPY $r25r24 %0:dregs = COPY $r23r22 %4:gpr8 = COPY %2.sub_lo RJMPk %bb.2 bb.1 (%ir-block.0): successors: %bb.2(0x80000000) %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg bb.2 (%ir-block.0): successors: %bb.1(0x40000000), %bb.3(0x40000000) %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.1 %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.1 %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.1 %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.1 %16:gpr8 = PHI %4, %bb.0, %17, %bb.1 %17:gpr8 = DECRd %16, implicit-def $sreg BRPLk %bb.1, implicit $sreg bb.3 (%ir-block.0): %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo $r23r22 = COPY %5 $r25r24 = COPY %6 RET implicit $r23r22, implicit $r25r24, implicit $r1 ... to: body: \| bb.0 (%ir-block.0): successors: %bb.1(0x80000000) liveins: $r23r22, $r25r24, $r19r18 %2:dregs = COPY $r19r18 %1:dregs = COPY $r25r24 %0:dregs = COPY $r23r22 %4:gpr8 = COPY %2.sub_lo # fall-through instead of jumping bb.1 (%ir-block.0): successors: %bb.2(0x40000000), %bb.3(0x40000000) %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.2 %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.2 %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.2 %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.2 %16:gpr8 = PHI %4, %bb.0, %17, %bb.2 %17:gpr8 = DECRd %16, implicit-def $sreg BRMIk %bb.3, implicit $sreg # <- reversed comparison + fallthrough bb.2 (%ir-block.0): successors: %bb.1(0x80000000) %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg RJMPk %bb.1 # <- jump to the beginning bb.3 (%ir-block.0): %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo $r23r22 = COPY %5 $r25r24 = COPY %6 RET implicit $r23r22, implicit $r25r24, implicit $r1 It looks like the generated assembly remained the same, I've also checked the actual binary through rustc + simavr. Patryk27: Note that I've changed to code to re-arrange the generated blocks a bit, from: ``` body…
		MachineInstr &MI, MachineBasicBlock *BB, Register ShiftNum,
		MutableArrayRef<std::pair<Register, int>> Regs, ISD::NodeType Opc) {
		const DebugLoc &DL = MI.getDebugLoc();
		MachineFunction *MF = BB->getParent();
		const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();

		MachineBasicBlock *EntryBB = BB;
		MachineBasicBlock *CheckBB = MF->CreateMachineBasicBlock(BB->getBasicBlock());
		MachineBasicBlock *LoopBB = MF->CreateMachineBasicBlock(BB->getBasicBlock());

		MF->push_back(CheckBB);
		MF->push_back(LoopBB);
		MachineBasicBlock *ExitBB = EntryBB->splitAt(MI, false);
		Patryk27AuthorUnsubmitted Not Done Reply Inline Actions Alright, this is wrong, after all - I've just tested it on a more elaborate code in rustc and `EntryBB->removeSuccessor(ExitBB);` triggers an LLVM panic (presumably because EntryBB == ExitBB). I kinda don't understand why doing something like this: MachineBasicBlock ExitBB = EntryBB->splitAt(MI, false); if (EntryBB == ExitBB) { assert(EntryBB->canFallThrough() && "Expected a fallthrough block!"); ExitBB = EntryBB->getFallThrough(); } ... is not sufficient, though 👀 Patryk27:* Alright, this is wrong, after all - I've just tested it on a more elaborate code in rustc and…
		benshi001Unsubmitted Not Done Reply Inline Actions Is it possible to fix the 32-bit shift issue in moderate way? for example, keep the pass in `AVRShiftExpand.cpp`. benshi001: Is it possible to fix the 32-bit shift issue in moderate way? for example, keep the pass in…

		CheckBB->moveAfter(EntryBB);
		LoopBB->moveAfter(CheckBB);
		ExitBB->moveAfter(LoopBB);

		EntryBB->addSuccessor(CheckBB);
		LoopBB->addSuccessor(CheckBB);
		CheckBB->addSuccessor(LoopBB);
		CheckBB->addSuccessor(ExitBB);
		EntryBB->removeSuccessor(ExitBB);

		// Create virtual registers for the value phi nodes.
		SmallVector<Register, 4> PhiRegs;
		SmallVector<std::pair<Register, int>, 4> PhiRegPairs;

		for (size_t I = 0; I < Regs.size(); I++) {
		Register Reg = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		PhiRegs.push_back(Reg);
		PhiRegPairs.push_back(std::pair(Reg, 0));
		}

		// Shift the registers by one.
		//
		// Note that we build blocks kinda in a reversed-order (in reality LoopBB is
		// after CheckBB), because in order to build CheckBB, we need to know the
		// PHI nodes from LoopBB.
		insertMultibyteShift(LoopBB->end(), LoopBB, DL, PhiRegPairs, Opc, 1);

		// Jump back to the loop's body.
		BuildMI(LoopBB, DL, TII.get(AVR::RJMPk)).addMBB(CheckBB);

		// Create PHI nodes for the value that is shifted.
		for (size_t I = 0; I < Regs.size(); I++) {
		auto Pair = Regs[I];

		BuildMI(CheckBB, DL, TII.get(AVR::PHI), PhiRegs[I])
		.addReg(Pair.first, 0, Pair.second)
		.addMBB(EntryBB)
		.addReg(PhiRegPairs[I].first, 0, PhiRegPairs[I].second)
		.addMBB(LoopBB);

		Regs[I] = std::pair(PhiRegs[I], 0);
		}

		// Create a PHI node for the loop counter.
		Register CntPhi = MRI.createVirtualRegister(&AVR::GPR8RegClass);
		Register CntDec = MRI.createVirtualRegister(&AVR::GPR8RegClass);

		BuildMI(CheckBB, DL, TII.get(AVR::PHI), CntPhi)
		.addReg(ShiftNum)
		.addMBB(EntryBB)
		.addReg(CntDec)
		.addMBB(LoopBB);

		// Decrement the counter; if we're done, jump to the exit and otherwise fall
		// through to the CheckBB.
		BuildMI(CheckBB, DL, TII.get(AVR::DECRd), CntDec).addReg(CntPhi);
		BuildMI(CheckBB, DL, TII.get(AVR::BRMIk)).addMBB(ExitBB);

		return ExitBB;
		}

// Do a wide (32-bit) shift.		// Do a wide (32-bit) shift.
MachineBasicBlock *		MachineBasicBlock *
AVRTargetLowering::insertWideShift(MachineInstr &MI,		AVRTargetLowering::insertWideShift(MachineInstr &MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();		const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
const DebugLoc &dl = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();
		MachineBasicBlock::iterator MBBI(&MI);

// How much to shift to the right (meaning: a negative number indicates a left
// shift).
int64_t ShiftAmt = MI.getOperand(4).getImm();
ISD::NodeType Opc;		ISD::NodeType Opc;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AVR::Lsl32:		case AVR::Lsl32:
Opc = ISD::SHL;		Opc = ISD::SHL;
break;		break;
case AVR::Lsr32:		case AVR::Lsr32:
Opc = ISD::SRL;		Opc = ISD::SRL;
break;		break;
case AVR::Asr32:		case AVR::Asr32:
Opc = ISD::SRA;		Opc = ISD::SRA;
break;		break;
}		}

// Read the input registers, with the most significant register at index 0.		// Read the input registers, with the most significant register at index 0.
std::array<std::pair<Register, int>, 4> Registers = {		std::array<std::pair<Register, int>, 4> Registers = {
std::pair(MI.getOperand(3).getReg(), AVR::sub_hi),		std::pair(MI.getOperand(3).getReg(), AVR::sub_hi),
std::pair(MI.getOperand(3).getReg(), AVR::sub_lo),		std::pair(MI.getOperand(3).getReg(), AVR::sub_lo),
std::pair(MI.getOperand(2).getReg(), AVR::sub_hi),		std::pair(MI.getOperand(2).getReg(), AVR::sub_hi),
std::pair(MI.getOperand(2).getReg(), AVR::sub_lo),		std::pair(MI.getOperand(2).getReg(), AVR::sub_lo),
};		};

// Do the shift. The registers are modified in-place.		// Do the shift. The registers are modified in-place.
insertMultibyteShift(MI, BB, Registers, Opc, ShiftAmt);		int64_t ShiftAmt = 1;
		if (MI.getOperand(4).isImm()) {
		// The shift amount is known at compile time.
		ShiftAmt = MI.getOperand(4).getImm();
		insertMultibyteShift(MBBI, BB, MI.getDebugLoc(), Registers, Opc, ShiftAmt);
		} else {
		// The shift amount is not known at compile time. We need to create a loop.
		Register ShiftNum = MI.getOperand(4).getReg();
		BB = insertMultibyteShiftLoop(MI, BB, ShiftNum, Registers, Opc);

		// Insert REG_SEQUENCE instructions at the beginning of ExitBB.
		MBBI = BB->begin();
		}

// Combine the 8-bit registers into 16-bit register pairs.		// Combine the 8-bit registers into 16-bit register pairs.
// This done either from LSB to MSB or from MSB to LSB, depending on the		// This done either from LSB to MSB or from MSB to LSB, depending on the
// shift. It's an optimization so that the register allocator will use the		// shift. It's an optimization so that the register allocator will use the
// fewest movs possible (which order we use isn't a correctness issue, just an		// fewest movs possible (which order we use isn't a correctness issue, just an
// optimization issue).		// optimization issue).
// - lsl prefers starting from the most significant byte (2nd case).		// - lsl prefers starting from the most significant byte (2nd case).
// - lshr prefers starting from the least significant byte (1st case).		// - lshr prefers starting from the least significant byte (1st case).
// - for ashr it depends on the number of shifted bytes.		// - for ashr it depends on the number of shifted bytes.
// Some shift operations still don't get the most optimal mov sequences even		// Some shift operations still don't get the most optimal mov sequences even
// with this distinction. TODO: figure out why and try to fix it (but we're		// with this distinction. TODO: figure out why and try to fix it (but we're
// already equal to or faster than avr-gcc in all cases except ashr 8).		// already equal to or faster than avr-gcc in all cases except ashr 8).
if (Opc != ISD::SHL &&		if (Opc != ISD::SHL &&
(Opc != ISD::SRA \|\| (ShiftAmt < 16 \|\| ShiftAmt >= 22))) {		(Opc != ISD::SRA \|\| (ShiftAmt < 16 \|\| ShiftAmt >= 22))) {
// Use the resulting registers starting with the least significant byte.		// Use the resulting registers starting with the least significant byte.
BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())		BuildMI(*BB, MBBI, DL, TII.get(AVR::REG_SEQUENCE),
		MI.getOperand(0).getReg())
.addReg(Registers[3].first, 0, Registers[3].second)		.addReg(Registers[3].first, 0, Registers[3].second)
.addImm(AVR::sub_lo)		.addImm(AVR::sub_lo)
.addReg(Registers[2].first, 0, Registers[2].second)		.addReg(Registers[2].first, 0, Registers[2].second)
.addImm(AVR::sub_hi);		.addImm(AVR::sub_hi);
BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())		BuildMI(*BB, MBBI, DL, TII.get(AVR::REG_SEQUENCE),
		MI.getOperand(1).getReg())
.addReg(Registers[1].first, 0, Registers[1].second)		.addReg(Registers[1].first, 0, Registers[1].second)
.addImm(AVR::sub_lo)		.addImm(AVR::sub_lo)
.addReg(Registers[0].first, 0, Registers[0].second)		.addReg(Registers[0].first, 0, Registers[0].second)
.addImm(AVR::sub_hi);		.addImm(AVR::sub_hi);
} else {		} else {
// Use the resulting registers starting with the most significant byte.		// Use the resulting registers starting with the most significant byte.
BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(1).getReg())		BuildMI(*BB, MBBI, DL, TII.get(AVR::REG_SEQUENCE),
		MI.getOperand(1).getReg())
.addReg(Registers[0].first, 0, Registers[0].second)		.addReg(Registers[0].first, 0, Registers[0].second)
.addImm(AVR::sub_hi)		.addImm(AVR::sub_hi)
.addReg(Registers[1].first, 0, Registers[1].second)		.addReg(Registers[1].first, 0, Registers[1].second)
.addImm(AVR::sub_lo);		.addImm(AVR::sub_lo);
BuildMI(*BB, MI, dl, TII.get(AVR::REG_SEQUENCE), MI.getOperand(0).getReg())		BuildMI(*BB, MBBI, DL, TII.get(AVR::REG_SEQUENCE),
		MI.getOperand(0).getReg())
.addReg(Registers[2].first, 0, Registers[2].second)		.addReg(Registers[2].first, 0, Registers[2].second)
.addImm(AVR::sub_hi)		.addImm(AVR::sub_hi)
.addReg(Registers[3].first, 0, Registers[3].second)		.addReg(Registers[3].first, 0, Registers[3].second)
.addImm(AVR::sub_lo);		.addImm(AVR::sub_lo);
}		}

// Remove the pseudo instruction.		// Remove the pseudo instruction.
MI.eraseFromParent();		MI.eraseFromParent();
▲ Show 20 Lines • Show All 591 Lines • Show Last 20 Lines

llvm/lib/Target/AVR/AVRShiftExpand.cpp

This file was deleted.

	//===- AVRShift.cpp - Shift Expansion Pass --------------------------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//
	//
	/// \file
	/// Expand 32-bit shift instructions (shl, lshr, ashr) to inline loops, just
	/// like avr-gcc. This must be done in IR because otherwise the type legalizer
	/// will turn 32-bit shifts into (non-existing) library calls such as __ashlsi3.
	//
	//===----------------------------------------------------------------------===//

	#include "AVR.h"
	#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/InstIterator.h"

	using namespace llvm;

	namespace {

	class AVRShiftExpand : public FunctionPass {
	public:
	static char ID;

	AVRShiftExpand() : FunctionPass(ID) {}

	bool runOnFunction(Function &F) override;

	StringRef getPassName() const override { return "AVR Shift Expansion"; }

	private:
	void expand(BinaryOperator *BI);
	};

	} // end of anonymous namespace

	char AVRShiftExpand::ID = 0;

	INITIALIZE_PASS(AVRShiftExpand, "avr-shift-expand", "AVR Shift Expansion",
	false, false)

	Pass *llvm::createAVRShiftExpandPass() { return new AVRShiftExpand(); }

	bool AVRShiftExpand::runOnFunction(Function &F) {
	SmallVector<BinaryOperator *, 1> ShiftInsts;
	auto &Ctx = F.getContext();
	for (Instruction &I : instructions(F)) {
	if (!I.isShift())
	// Only expand shift instructions (shl, lshr, ashr).
	continue;
	if (I.getType() != Type::getInt32Ty(Ctx))
	// Only expand plain i32 types.
	continue;
	if (isa<ConstantInt>(I.getOperand(1)))
	// Only expand when the shift amount is not known.
	// Known shift amounts are (currently) better expanded inline.
	continue;
	ShiftInsts.push_back(cast<BinaryOperator>(&I));
	}

	// The expanding itself needs to be done separately as expand() will remove
	// these instructions. Removing instructions while iterating over a basic
	// block is not a great idea.
	for (auto *I : ShiftInsts) {
	expand(I);
	}

	// Return whether this function expanded any shift instructions.
	return ShiftInsts.size() > 0;
	}

	void AVRShiftExpand::expand(BinaryOperator *BI) {
	auto &Ctx = BI->getContext();
	IRBuilder<> Builder(BI);
	Type *Int32Ty = Type::getInt32Ty(Ctx);
	Type *Int8Ty = Type::getInt8Ty(Ctx);
	Value *Int8Zero = ConstantInt::get(Int8Ty, 0);

	// Split the current basic block at the point of the existing shift
	// instruction and insert a new basic block for the loop.
	BasicBlock *BB = BI->getParent();
	Function *F = BB->getParent();
	BasicBlock *EndBB = BB->splitBasicBlock(BI, "shift.done");
	BasicBlock *LoopBB = BasicBlock::Create(Ctx, "shift.loop", F, EndBB);

	// Truncate the shift amount to i8, which is trivially lowered to a single
	// AVR register.
	Builder.SetInsertPoint(&BB->back());
	Value *ShiftAmount = Builder.CreateTrunc(BI->getOperand(1), Int8Ty);

	// Replace the unconditional branch that splitBasicBlock created with a
	// conditional branch.
	Value *Cmp1 = Builder.CreateICmpEQ(ShiftAmount, Int8Zero);
	Builder.CreateCondBr(Cmp1, EndBB, LoopBB);
	BB->back().eraseFromParent();

	// Create the loop body starting with PHI nodes.
	Builder.SetInsertPoint(LoopBB);
	PHINode *ShiftAmountPHI = Builder.CreatePHI(Int8Ty, 2);
	ShiftAmountPHI->addIncoming(ShiftAmount, BB);
	PHINode *ValuePHI = Builder.CreatePHI(Int32Ty, 2);
	ValuePHI->addIncoming(BI->getOperand(0), BB);

	// Subtract the shift amount by one, as we're shifting one this loop
	// iteration.
	Value *ShiftAmountSub =
	Builder.CreateSub(ShiftAmountPHI, ConstantInt::get(Int8Ty, 1));
	ShiftAmountPHI->addIncoming(ShiftAmountSub, LoopBB);

	// Emit the actual shift instruction. The difference is that this shift
	// instruction has a constant shift amount, which can be emitted inline
	// without a library call.
	Value *ValueShifted;
	switch (BI->getOpcode()) {
	case Instruction::Shl:
	ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(Int32Ty, 1));
	break;
	case Instruction::LShr:
	ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
	break;
	case Instruction::AShr:
	ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
	break;
	default:
	llvm_unreachable("asked to expand an instruction that is not a shift");
	}
	ValuePHI->addIncoming(ValueShifted, LoopBB);

	// Branch to either the loop again (if there is more to shift) or to the
	// basic block after the loop (if all bits are shifted).
	Value *Cmp2 = Builder.CreateICmpEQ(ShiftAmountSub, Int8Zero);
	Builder.CreateCondBr(Cmp2, EndBB, LoopBB);

	// Collect the resulting value. This is necessary in the IR but won't produce
	// any actual instructions.
	Builder.SetInsertPoint(BI);
	PHINode *Result = Builder.CreatePHI(Int32Ty, 2);
	Result->addIncoming(BI->getOperand(0), BB);
	Result->addIncoming(ValueShifted, LoopBB);

	// Replace the original shift instruction.
	BI->replaceAllUsesWith(Result);
	BI->eraseFromParent();
	}

llvm/lib/Target/AVR/AVRTargetMachine.cpp

	//===-- AVRTargetMachine.cpp - Define TargetMachine for AVR ---------------===//			//===-- AVRTargetMachine.cpp - Define TargetMachine for AVR ---------------===//
				Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	public:			public:
	AVRPassConfig(AVRTargetMachine &TM, PassManagerBase &PM)			AVRPassConfig(AVRTargetMachine &TM, PassManagerBase &PM)
	: TargetPassConfig(TM, PM) {}			: TargetPassConfig(TM, PM) {}

	AVRTargetMachine &getAVRTargetMachine() const {			AVRTargetMachine &getAVRTargetMachine() const {
	return getTM<AVRTargetMachine>();			return getTM<AVRTargetMachine>();
	}			}

	void addIRPasses() override;
	bool addInstSelector() override;			bool addInstSelector() override;
	void addPreSched2() override;			void addPreSched2() override;
	void addPreEmitPass() override;			void addPreEmitPass() override;
	};			};
	} // namespace			} // namespace

	TargetPassConfig *AVRTargetMachine::createPassConfig(PassManagerBase &PM) {			TargetPassConfig *AVRTargetMachine::createPassConfig(PassManagerBase &PM) {
	return new AVRPassConfig(*this, PM);			return new AVRPassConfig(*this, PM);
	}			}

	void AVRPassConfig::addIRPasses() {
	// Expand instructions like
	// %result = shl i32 %n, %amount
	// to a loop so that library calls are avoided.
	addPass(createAVRShiftExpandPass());

	TargetPassConfig::addIRPasses();
	}

	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAVRTarget() {			extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAVRTarget() {
	// Register the target.			// Register the target.
	RegisterTargetMachine<AVRTargetMachine> X(getTheAVRTarget());			RegisterTargetMachine<AVRTargetMachine> X(getTheAVRTarget());

	auto &PR = *PassRegistry::getPassRegistry();			auto &PR = *PassRegistry::getPassRegistry();
	initializeAVRExpandPseudoPass(PR);			initializeAVRExpandPseudoPass(PR);
	initializeAVRShiftExpandPass(PR);
	initializeAVRDAGToDAGISelPass(PR);			initializeAVRDAGToDAGISelPass(PR);
	}			}

	const AVRSubtarget *AVRTargetMachine::getSubtargetImpl() const {			const AVRSubtarget *AVRTargetMachine::getSubtargetImpl() const {
	return &SubTarget;			return &SubTarget;
	}			}

	const AVRSubtarget *AVRTargetMachine::getSubtargetImpl(const Function &) const {			const AVRSubtarget *AVRTargetMachine::getSubtargetImpl(const Function &) const {
	Show All 33 Lines

llvm/lib/Target/AVR/CMakeLists.txt

Show All 17 Lines	add_llvm_target(AVRCodeGen
AVRAsmPrinter.cpp		AVRAsmPrinter.cpp
AVRExpandPseudoInsts.cpp		AVRExpandPseudoInsts.cpp
AVRFrameLowering.cpp		AVRFrameLowering.cpp
AVRInstrInfo.cpp		AVRInstrInfo.cpp
AVRISelDAGToDAG.cpp		AVRISelDAGToDAG.cpp
AVRISelLowering.cpp		AVRISelLowering.cpp
AVRMCInstLower.cpp		AVRMCInstLower.cpp
AVRRegisterInfo.cpp		AVRRegisterInfo.cpp
AVRShiftExpand.cpp
AVRSubtarget.cpp		AVRSubtarget.cpp
AVRTargetMachine.cpp		AVRTargetMachine.cpp
AVRTargetObjectFile.cpp		AVRTargetObjectFile.cpp

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen

LINK_COMPONENTS		LINK_COMPONENTS
Show All 19 Lines

llvm/test/CodeGen/AVR/shift-expand.ll

This file was deleted.

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -avr-shift-expand -S %s -o - \| FileCheck %s

	; The avr-shift-expand pass expands large shifts with a non-constant shift
	; amount to a loop. These loops avoid generating a (non-existing) builtin such
	; as __ashlsi3.

	target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
	target triple = "avr"

	define i32 @shl(i32 %value, i32 %amount) addrspace(1) {
	; CHECK-LABEL: @shl(
	; CHECK-NEXT: [[TMP1:%.]] = trunc i32 [[AMOUNT:%.]] to i8
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i8 [[TMP1]], 0
	; CHECK-NEXT: br i1 [[TMP2]], label [[SHIFT_DONE:%.]], label [[SHIFT_LOOP:%.]]
	; CHECK: shift.loop:
	; CHECK-NEXT: [[TMP3:%.]] = phi i8 [ [[TMP1]], [[TMP0:%.]] ], [ [[TMP5:%.*]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi i32 [ [[VALUE:%.]], [[TMP0]] ], [ [[TMP6:%.*]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: [[TMP5]] = sub i8 [[TMP3]], 1
	; CHECK-NEXT: [[TMP6]] = shl i32 [[TMP4]], 1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP5]], 0
	; CHECK-NEXT: br i1 [[TMP7]], label [[SHIFT_DONE]], label [[SHIFT_LOOP]]
	; CHECK: shift.done:
	; CHECK-NEXT: [[TMP8:%.*]] = phi i32 [ [[VALUE]], [[TMP0]] ], [ [[TMP6]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: ret i32 [[TMP8]]
	;
	%result = shl i32 %value, %amount
	ret i32 %result
	}

	define i32 @lshr(i32 %value, i32 %amount) addrspace(1) {
	; CHECK-LABEL: @lshr(
	; CHECK-NEXT: [[TMP1:%.]] = trunc i32 [[AMOUNT:%.]] to i8
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i8 [[TMP1]], 0
	; CHECK-NEXT: br i1 [[TMP2]], label [[SHIFT_DONE:%.]], label [[SHIFT_LOOP:%.]]
	; CHECK: shift.loop:
	; CHECK-NEXT: [[TMP3:%.]] = phi i8 [ [[TMP1]], [[TMP0:%.]] ], [ [[TMP5:%.*]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi i32 [ [[VALUE:%.]], [[TMP0]] ], [ [[TMP6:%.*]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: [[TMP5]] = sub i8 [[TMP3]], 1
	; CHECK-NEXT: [[TMP6]] = lshr i32 [[TMP4]], 1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP5]], 0
	; CHECK-NEXT: br i1 [[TMP7]], label [[SHIFT_DONE]], label [[SHIFT_LOOP]]
	; CHECK: shift.done:
	; CHECK-NEXT: [[TMP8:%.*]] = phi i32 [ [[VALUE]], [[TMP0]] ], [ [[TMP6]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: ret i32 [[TMP8]]
	;
	%result = lshr i32 %value, %amount
	ret i32 %result
	}

	define i32 @ashr(i32 %0, i32 %1) addrspace(1) {
	; CHECK-LABEL: @ashr(
	; CHECK-NEXT: [[TMP3:%.]] = trunc i32 [[TMP1:%.]] to i8
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i8 [[TMP3]], 0
	; CHECK-NEXT: br i1 [[TMP4]], label [[SHIFT_DONE:%.]], label [[SHIFT_LOOP:%.]]
	; CHECK: shift.loop:
	; CHECK-NEXT: [[TMP5:%.]] = phi i8 [ [[TMP3]], [[TMP2:%.]] ], [ [[TMP7:%.*]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi i32 [ [[TMP0:%.]], [[TMP2]] ], [ [[TMP8:%.*]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: [[TMP7]] = sub i8 [[TMP5]], 1
	; CHECK-NEXT: [[TMP8]] = ashr i32 [[TMP6]], 1
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP7]], 0
	; CHECK-NEXT: br i1 [[TMP9]], label [[SHIFT_DONE]], label [[SHIFT_LOOP]]
	; CHECK: shift.done:
	; CHECK-NEXT: [[TMP10:%.*]] = phi i32 [ [[TMP0]], [[TMP2]] ], [ [[TMP8]], [[SHIFT_LOOP]] ]
	; CHECK-NEXT: ret i32 [[TMP10]]
	;
	%3 = ashr i32 %0, %1
	ret i32 %3
	}

	; This function is not modified because it is not an i32.
	define i40 @shl40(i40 %value, i40 %amount) addrspace(1) {
	; CHECK-LABEL: @shl40(
	; CHECK-NEXT: [[RESULT:%.]] = shl i40 [[VALUE:%.]], [[AMOUNT:%.*]]
	; CHECK-NEXT: ret i40 [[RESULT]]
	;
	%result = shl i40 %value, %amount
	ret i40 %result
	}

	; This function isn't either, although perhaps it should.
	define i24 @shl24(i24 %value, i24 %amount) addrspace(1) {
	; CHECK-LABEL: @shl24(
	; CHECK-NEXT: [[RESULT:%.]] = shl i24 [[VALUE:%.]], [[AMOUNT:%.*]]
	; CHECK-NEXT: ret i24 [[RESULT]]
	;
	%result = shl i24 %value, %amount
	ret i24 %result
	}

llvm/test/CodeGen/AVR/shift-loop.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				; RUN: llc < %s -mtriple=avr -verify-machineinstrs -stop-after=dead-mi-elimination \| FileCheck %s

				; This test shows the machine IR that is generated when lowering a shift
				; operation to a loop.

				define i32 @shl_i32_n(i32 %a, i32 %b) #0 {
				; CHECK-LABEL: name: shl_i32_n
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: successors: %bb.1(0x80000000)
				; CHECK-NEXT: liveins: $r23r22, $r25r24, $r19r18
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:dregs = COPY $r19r18
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:dregs = COPY $r25r24
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:dregs = COPY $r23r22
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr8 = COPY [[COPY]].sub_lo
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.1 (%ir-block.0):
				; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.3(0x40000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr8 = PHI [[COPY1]].sub_hi, %bb.0, %15, %bb.2
				; CHECK-NEXT: [[PHI1:%[0-9]+]]:gpr8 = PHI [[COPY1]].sub_lo, %bb.0, %14, %bb.2
				; CHECK-NEXT: [[PHI2:%[0-9]+]]:gpr8 = PHI [[COPY2]].sub_hi, %bb.0, %13, %bb.2
				; CHECK-NEXT: [[PHI3:%[0-9]+]]:gpr8 = PHI [[COPY2]].sub_lo, %bb.0, %12, %bb.2
				; CHECK-NEXT: [[PHI4:%[0-9]+]]:gpr8 = PHI [[COPY3]], %bb.0, %17, %bb.2
				; CHECK-NEXT: [[DECRd:%[0-9]+]]:gpr8 = DECRd [[PHI4]], implicit-def $sreg
				; CHECK-NEXT: BRMIk %bb.3, implicit $sreg
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.2 (%ir-block.0):
				; CHECK-NEXT: successors: %bb.1(0x80000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[ADDRdRr:%[0-9]+]]:gpr8 = ADDRdRr [[PHI3]], [[PHI3]], implicit-def $sreg
				; CHECK-NEXT: [[ADCRdRr:%[0-9]+]]:gpr8 = ADCRdRr [[PHI2]], [[PHI2]], implicit-def $sreg, implicit $sreg
				; CHECK-NEXT: [[ADCRdRr1:%[0-9]+]]:gpr8 = ADCRdRr [[PHI1]], [[PHI1]], implicit-def $sreg, implicit $sreg
				; CHECK-NEXT: [[ADCRdRr2:%[0-9]+]]:gpr8 = ADCRdRr [[PHI]], [[PHI]], implicit-def $sreg, implicit $sreg
				; CHECK-NEXT: RJMPk %bb.1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.3 (%ir-block.0):
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:dregs = REG_SEQUENCE [[PHI]], %subreg.sub_hi, [[PHI1]], %subreg.sub_lo
				; CHECK-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:dregs = REG_SEQUENCE [[PHI2]], %subreg.sub_hi, [[PHI3]], %subreg.sub_lo
				; CHECK-NEXT: $r23r22 = COPY [[REG_SEQUENCE1]]
				; CHECK-NEXT: $r25r24 = COPY [[REG_SEQUENCE]]
				; CHECK-NEXT: RET implicit $r23r22, implicit $r25r24, implicit $r1
				%res = shl i32 %a, %b
				ret i32 %res
				}

llvm/test/CodeGen/AVR/shift32.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=avr -mattr=movw -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=avr -mattr=movw -verify-machineinstrs \| FileCheck %s

				; Shift by a number unknown at compile time.
				; The 'optsize' attribute is set to avoid duplicating part of the loop.
				; TODO: it is more efficent to jump at the start and do the check where the
				; 'rjmp' is now. The branch relaxation pass puts them in this non-optimal order.

				define i32 @shl_i32_n(i32 %a, i32 %b) #0 {
				; CHECK-LABEL: shl_i32_n:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: dec r18
				; CHECK-NEXT: brmi .LBB0_3
				; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: lsl r22
				; CHECK-NEXT: rol r23
				; CHECK-NEXT: rol r24
				; CHECK-NEXT: rol r25
				; CHECK-NEXT: rjmp .LBB0_1
				; CHECK-NEXT: .LBB0_3:
				; CHECK-NEXT: ret
				%res = shl i32 %a, %b
				ret i32 %res
				}

				define i32 @lshr_i32_n(i32 %a, i32 %b) #0 {
				; CHECK-LABEL: lshr_i32_n:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: .LBB1_1: ; =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: dec r18
				; CHECK-NEXT: brmi .LBB1_3
				; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB1_1 Depth=1
				; CHECK-NEXT: lsr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: rjmp .LBB1_1
				; CHECK-NEXT: .LBB1_3:
				; CHECK-NEXT: ret
				%res = lshr i32 %a, %b
				ret i32 %res
				}

				define i32 @ashr_i32_n(i32 %a, i32 %b) #0 {
				; CHECK-LABEL: ashr_i32_n:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: .LBB2_1: ; =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: dec r18
				; CHECK-NEXT: brmi .LBB2_3
				; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB2_1 Depth=1
				; CHECK-NEXT: asr r25
				; CHECK-NEXT: ror r24
				; CHECK-NEXT: ror r23
				; CHECK-NEXT: ror r22
				; CHECK-NEXT: rjmp .LBB2_1
				; CHECK-NEXT: .LBB2_3:
				; CHECK-NEXT: ret
				%res = ashr i32 %a, %b
				ret i32 %res
				}

				; Shift by a constant known at compile time.

	define i32 @shl_i32_1(i32 %a) {			define i32 @shl_i32_1(i32 %a) {
	; CHECK-LABEL: shl_i32_1:			; CHECK-LABEL: shl_i32_1:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: lsl r22			; CHECK-NEXT: lsl r22
	; CHECK-NEXT: rol r23			; CHECK-NEXT: rol r23
	; CHECK-NEXT: rol r24			; CHECK-NEXT: rol r24
	; CHECK-NEXT: rol r25			; CHECK-NEXT: rol r25
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: lsl r25			; CHECK-NEXT: lsl r25
	; CHECK-NEXT: sbc r22, r22			; CHECK-NEXT: sbc r22, r22
	; CHECK-NEXT: mov r23, r22			; CHECK-NEXT: mov r23, r22
	; CHECK-NEXT: movw r24, r22			; CHECK-NEXT: movw r24, r22
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = ashr i32 %a, 31			%res = ashr i32 %a, 31
	ret i32 %res			ret i32 %res
	}			}

				attributes #0 = { optsize }

llvm/utils/gn/secondary/llvm/lib/Target/AVR/BUILD.gn

Show All 31 Lines	sources = [
"AVRAsmPrinter.cpp",		"AVRAsmPrinter.cpp",
"AVRExpandPseudoInsts.cpp",		"AVRExpandPseudoInsts.cpp",
"AVRFrameLowering.cpp",		"AVRFrameLowering.cpp",
"AVRISelDAGToDAG.cpp",		"AVRISelDAGToDAG.cpp",
"AVRISelLowering.cpp",		"AVRISelLowering.cpp",
"AVRInstrInfo.cpp",		"AVRInstrInfo.cpp",
"AVRMCInstLower.cpp",		"AVRMCInstLower.cpp",
"AVRRegisterInfo.cpp",		"AVRRegisterInfo.cpp",
"AVRShiftExpand.cpp",
"AVRSubtarget.cpp",		"AVRSubtarget.cpp",
"AVRTargetMachine.cpp",		"AVRTargetMachine.cpp",
"AVRTargetObjectFile.cpp",		"AVRTargetObjectFile.cpp",
]		]
}		}

# This is a bit different from most build files: Due to this group		# This is a bit different from most build files: Due to this group
# having the directory's name, "//llvm/lib/Target/AVR" will refer to this		# having the directory's name, "//llvm/lib/Target/AVR" will refer to this
Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Expand shifts during AVRISelLoweringAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 532386

clang/docs/tools/clang-formatted-files.txt

llvm/lib/Target/AVR/AVR.h

llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp

llvm/lib/Target/AVR/AVRISelLowering.cpp

llvm/lib/Target/AVR/AVRShiftExpand.cpp

llvm/lib/Target/AVR/AVRTargetMachine.cpp

llvm/lib/Target/AVR/CMakeLists.txt

llvm/test/CodeGen/AVR/shift-expand.ll

llvm/test/CodeGen/AVR/shift-loop.ll

llvm/test/CodeGen/AVR/shift32.ll

llvm/utils/gn/secondary/llvm/lib/Target/AVR/BUILD.gn

[AVR] Expand shifts during AVRISelLowering
AbandonedPublic