This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/LoongArch/
-
Target/
-
LoongArch/
-
LoongArchISelLowering.h
1/2
LoongArchISelLowering.cpp
-
LoongArchInstrInfo.td
-
LoongArchMCInstLower.cpp
-
test/CodeGen/LoongArch/ir-instruction/
-
CodeGen/
-
LoongArch/
-
ir-instruction/
1/5
sdiv-udiv-srem-urem.ll

Differential D128572

[LoongArch] Add codegen support for division operations
ClosedPublic

Authored by SixWeining on Jun 24 2022, 8:37 PM.

Download Raw Diff

Details

Reviewers

xen0n
MaskRay
xry111

Commits

rGd29215790f0f: [LoongArch] Add codegen support for division operations

Summary

These operations include sdiv/udiv/srem/urem.

As the ISA [https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_div_wudu_mod_wudu]
described, when the divisor is 0, the result can be any value, but no
exception will be triggered. Unlike gcc, which by default emit code
that checks divide-by-zero after the division or modulus instruction,
we only emit this check when the -loongarch-check-zero-division
option is passed.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SixWeining created this revision.Jun 24 2022, 8:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 8:37 PM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

SixWeining requested review of this revision.Jun 24 2022, 8:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 8:37 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B171991: Diff 439948.Jun 24 2022, 9:29 PM

xry111 added inline comments.Jun 24 2022, 10:45 PM

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133	It looks suboptimal: "div.w $a0, $a0, $a1" should work so these two sign-extensions are not needed. I'm not sure if it's easy to optimize this. If an optimization is not suitable for this revision, we can do it later.

Trapping division/modulus operations are signatures of MIPS codegen, and indeed here the trapping-by-default behavior and the flag seem to come from MIPS. However, as division-by-zero in LLVM IR is undefined behavior, why can't we just omit the trapping behavior altogether (and match RISCV in this regard), or at least disable the trapping by default?

SixWeining added inline comments.Jun 25 2022, 1:02 AM

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133	Yes. 32bit division can be optimized to div.w but we must make sure the inputs are sign extend values. This limitation is marked in Chinese ISA document but not in the English document. Maybe the English version is outdated. define i32 @sdiv_i32(i32 %a, i32 %b) { entry: %r = sdiv i32 %a, %b ret i32 %r } => ; LA64-NOTRAP-NEXT: addi.w $a1, $a1, 0 ; LA64-NOTRAP-NEXT: addi.w $a0, $a0, 0 ; LA64-NOTRAP-NEXT: div.w $a0, $a0, $a1 define i32 @sdiv_i32(i32 signext %a, i32 signext %b) { entry: %r = sdiv i32 %a, %b ret i32 %r } => ; LA64-NOTRAP-NEXT: div.w $a0, $a0, $a1 Since this is an improvement to the codegen, let me implement it with seperate patch in future. Thanks.

xen0n added inline comments.Jun 25 2022, 1:20 AM

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133	Indeed; however the limitation seems hugely likely a hardware erratum, because almost any other 32-bit operation on LA64 silently ignores the upper bits. I think it's better to confirm with the hardware team, in case this is another error fixed in the translation.

In D128572#3609897, @xen0n wrote:

Trapping division/modulus operations are signatures of MIPS codegen, and indeed here the trapping-by-default behavior and the flag seem to come from MIPS. However, as division-by-zero in LLVM IR is undefined behavior, why can't we just omit the trapping behavior altogether (and match RISCV in this regard), or at least disable the trapping by default?

Good question! This is what I have ever thought. In fact I don't know why mips needs to check zero-divide by default. riscv and aarch64 do not check and I'm not sure it is because their ISAs define zero-divide having an fixed result while Mips and LoongArch do not.
Currently the upstramed gcc checks zero-divide by default, so here I do as it to keep consistent.

I just experimented on 3A5000 and it seems the "undefined result" is all-zeros, and unfortunately the div.w/mod.w UB when input operand is non-canonical (i.e. non-sign-extended) 32-bit is indeed present; in this case the output is all-zeros too.

So the sign-extension is indeed necessary for inputs not statically known to be signext. The all-zeros in case of UB is less useful than RISCV's all-ones, in terms of expected semantics (we want ideally something near "infinity"), but it's UB after all, and 0 is equally okay here.

In D128572#3609943, @SixWeining wrote:

In D128572#3609897, @xen0n wrote:

Trapping division/modulus operations are signatures of MIPS codegen, and indeed here the trapping-by-default behavior and the flag seem to come from MIPS. However, as division-by-zero in LLVM IR is undefined behavior, why can't we just omit the trapping behavior altogether (and match RISCV in this regard), or at least disable the trapping by default?

Good question! This is what I have ever thought. In fact I don't know why mips needs to check zero-divide by default. riscv and aarch64 do not check and I'm not sure it is because their ISAs define zero-divide having an fixed result while Mips and LoongArch do not.
Currently the upstramed gcc checks zero-divide by default, so here I do as it to keep consistent.

Might be better to just flip the switch on gcc upstream. @xry111 has kindly forwarded my comment to the Loongson GCC issue tracker so we could continue discussion there regarding gcc.

In D128572#3609943, @SixWeining wrote:

riscv and aarch64 do not check and I'm not sure it is because their ISAs define zero-divide having an fixed result while Mips and LoongArch do not.

Also, technically LoongArch could be revised to produce similarly defined results as RISC-V or AArch64, because anything is compatible with "undefined" so we won't break compatibility with the v1.00 ISA semantics. You could discuss this with your hardware team too if you want; IMO the RISC-V approach seems very reasonable, for example.

xry111 added a comment.Jun 25 2022, 1:47 AM

This comment was removed by xry111.

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133	GCC simply generates a div.w instruction.

In D128572#3609951, @xry111 wrote:

In D128572#3609944, @xen0n wrote:

I just experimented on 3A5000 and it seems the "undefined result" is all-zeros, and unfortunately the div.w/mod.w UB when input operand is non-canonical (i.e. non-sign-extended) 32-bit is indeed present; in this case the output is all-zeros too.

So the sign-extension is indeed necessary for inputs not statically known to be signext. The all-zeros in case of UB is less useful than RISCV's all-ones, in terms of expected semantics (we want ideally something near "infinity"), but it's UB after all, and 0 is equally okay here.

My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because the ABI has made sure that the parameters are already signed-extended.

Indeed. I experimented further and it's a whole can of worms with non-canonical inputs to {div/mod}.w:

# a = 4294967396 (0x0000000100000064)
# b = 5 (0x0000000000000005)
div.w(a, b) = 0  # 0x0000000000000000
mod.w(a, b) = 0  # 0x0000000000000000

# a = 4294967396 (0x0000000100000064)
# b = -5 (0xfffffffffffffffb)
div.w(a, b) = 0  # 0x0000000000000000
mod.w(a, b) = 0  # 0x0000000000000000

# a = 100 (0x0000000000000064)
# b = 8589934597 (0x0000000200000005)
div.w(a, b) = 85899345920  # 0x0000001400000000
mod.w(a, b) = 100  # 0x0000000000000064

# a = 100 (0x0000000000000064)
# b = -8589934597 (0xfffffffdfffffffb)
div.w(a, b) = -85899345920  # 0xffffffec00000000
mod.w(a, b) = 100  # 0x0000000000000064

The result is even more unintelligible if the divisor is non-canonical... let's just ensure signext one way or another (ABI, signext, legalization, etc.) and never invoke the UB. It's not fun.

It seems I can't remove an inline comment...

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll
133	My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because in the ABI parameters are already signed-extended.

disable trapping by default

SixWeining edited the summary of this revision. (Show Details)Jun 27 2022, 6:30 AM

Harbormaster completed remote builds in B172177: Diff 440197.Jun 27 2022, 7:23 AM

LGTM, thanks!

(I've checked harder and apparently the weird "erratum" is in fact a wart carried over from MIPS. The MIPS ISA manual contain the very same wording regarding non-canonical inputs to 32-bit division/modulus operations. Let's just hope this gets fixed in future models...)

This revision is now accepted and ready to land.Jun 27 2022, 11:03 PM

I'll submit a patch to change GCC behavior later (after fix PR106096, which is blocking bootstrap).

In D128572#3614246, @xen0n wrote:

LGTM, thanks!

(I've checked harder and apparently the weird "erratum" is in fact a wart carried over from MIPS. The MIPS ISA manual contain the very same wording regarding non-canonical inputs to 32-bit division/modulus operations. Let's just hope this gets fixed in future models...)

I guess the ALU of 3A5000 is not deviated too much from 3A4000, and it limits LA v1.00 to keep such division results undefined. It's usually a good thing to replace components one by one (instead of replacing many components in one shot) in a complex product like CPU though.

xen0n added inline comments.Jul 2 2022, 5:33 PM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
401	didn't notice before; this should be "mod".

SixWeining added inline comments.Jul 3 2022, 5:47 PM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
401	Thanks. Let me fix it.

Change 'rem' to 'mod' in comments.

Harbormaster completed remote builds in B173478: Diff 441988.Jul 3 2022, 6:46 PM

This revision was landed with ongoing or failed builds.Jul 6 2022, 2:55 AM

Closed by commit rGd29215790f0f: [LoongArch] Add codegen support for division operations (authored by SixWeining). · Explain Why

This revision was automatically updated to reflect the committed changes.

SixWeining added a commit: rGd29215790f0f: [LoongArch] Add codegen support for division operations.

SixWeining removed a parent revision: D128433: [LoongArch] Add LoongArch support to update_llc_test_checks.Jul 6 2022, 2:57 AM

SixWeining edited the summary of this revision. (Show Details)

Revision Contents

Path

Size

llvm/

lib/

Target/

LoongArch/

LoongArchISelLowering.h

4 lines

LoongArchISelLowering.cpp

53 lines

LoongArchInstrInfo.td

12 lines

LoongArchMCInstLower.cpp

5 lines

test/

CodeGen/

LoongArch/

ir-instruction/

sdiv-udiv-srem-urem.ll

685 lines

Diff 442490

llvm/lib/Target/LoongArch/LoongArchISelLowering.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	void analyzeInputArgs(CCState &CCInfo,
LoongArchCCAssignFn Fn) const;		LoongArchCCAssignFn Fn) const;
void analyzeOutputArgs(CCState &CCInfo,		void analyzeOutputArgs(CCState &CCInfo,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
LoongArchCCAssignFn Fn) const;		LoongArchCCAssignFn Fn) const;

SDValue lowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerShiftRightParts(SDValue Op, SelectionDAG &DAG, bool IsSRA) const;		SDValue lowerShiftRightParts(SDValue Op, SelectionDAG &DAG, bool IsSRA) const;

		MachineBasicBlock *
		EmitInstrWithCustomInserter(MachineInstr &MI,
		MachineBasicBlock *BB) const override;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_LOONGARCH_LOONGARCHISELLOWERING_H		#endif // LLVM_LIB_TARGET_LOONGARCH_LOONGARCHISELLOWERING_H

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp

Show All 20 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loongarch-isel-lowering"		#define DEBUG_TYPE "loongarch-isel-lowering"

		static cl::opt<bool> ZeroDivCheck(
		"loongarch-check-zero-division", cl::Hidden,
		cl::desc("Trap on integer division by zero."),
		cl::init(false));

LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,		LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
const LoongArchSubtarget &STI)		const LoongArchSubtarget &STI)
: TargetLowering(TM), Subtarget(STI) {		: TargetLowering(TM), Subtarget(STI) {

MVT GRLenVT = Subtarget.getGRLenVT();		MVT GRLenVT = Subtarget.getGRLenVT();
// Set up the register classes.		// Set up the register classes.
addRegisterClass(GRLenVT, &LoongArch::GPRRegClass);		addRegisterClass(GRLenVT, &LoongArch::GPRRegClass);
if (Subtarget.hasBasicF())		if (Subtarget.hasBasicF())
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	SDValue LoongArchTargetLowering::PerformDAGCombine(SDNode *N,
case ISD::AND:		case ISD::AND:
return performANDCombine(N, DAG, DCI, Subtarget);		return performANDCombine(N, DAG, DCI, Subtarget);
case ISD::SRL:		case ISD::SRL:
return performSRLCombine(N, DAG, DCI, Subtarget);		return performSRLCombine(N, DAG, DCI, Subtarget);
}		}
return SDValue();		return SDValue();
}		}

		static MachineBasicBlock *insertDivByZeroTrap(MachineInstr &MI,
		MachineBasicBlock &MBB,
		const TargetInstrInfo &TII) {
		if (!ZeroDivCheck)
		return &MBB;

		// Build instructions:
		// div(or mod) $dst, $dividend, $divisor
		xen0nUnsubmitted Not Done Reply Inline Actions didn't notice before; this should be "mod". xen0n: didn't notice before; this should be "mod".
		SixWeiningAuthorUnsubmitted Done Reply Inline Actions Thanks. Let me fix it. SixWeining: Thanks. Let me fix it.
		// bnez $divisor, 8
		// break 7
		// fallthrough
		MachineOperand &Divisor = MI.getOperand(2);
		auto FallThrough = std::next(MI.getIterator());

		BuildMI(MBB, FallThrough, MI.getDebugLoc(), TII.get(LoongArch::BNEZ))
		.addReg(Divisor.getReg(), getKillRegState(Divisor.isKill()))
		.addImm(8);

		// See linux header file arch/loongarch/include/uapi/asm/break.h for the
		// definition of BRK_DIVZERO.
		BuildMI(MBB, FallThrough, MI.getDebugLoc(), TII.get(LoongArch::BREAK))
		.addImm(7/BRK_DIVZERO/);

		// Clear Divisor's kill flag.
		Divisor.setIsKill(false);

		return &MBB;
		}

		MachineBasicBlock *LoongArchTargetLowering::EmitInstrWithCustomInserter(
		MachineInstr &MI, MachineBasicBlock *BB) const {

		switch (MI.getOpcode()) {
		default:
		llvm_unreachable("Unexpected instr type to insert");
		case LoongArch::DIV_W:
		case LoongArch::DIV_WU:
		case LoongArch::MOD_W:
		case LoongArch::MOD_WU:
		case LoongArch::DIV_D:
		case LoongArch::DIV_DU:
		case LoongArch::MOD_D:
		case LoongArch::MOD_DU:
		return insertDivByZeroTrap(MI, BB, Subtarget.getInstrInfo());
		break;
		}
		}

const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {		const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
switch ((LoongArchISD::NodeType)Opcode) {		switch ((LoongArchISD::NodeType)Opcode) {
case LoongArchISD::FIRST_NUMBER:		case LoongArchISD::FIRST_NUMBER:
break;		break;

#define NODE_NAME_CASE(node) \		#define NODE_NAME_CASE(node) \
case LoongArchISD::node: \		case LoongArchISD::node: \
return "LoongArchISD::" #node;		return "LoongArchISD::" #node;
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td

	Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines
	def ANDN : ALU_3R<0b00000000000101101, "andn">;			def ANDN : ALU_3R<0b00000000000101101, "andn">;
	def ORN : ALU_3R<0b00000000000101100, "orn">;			def ORN : ALU_3R<0b00000000000101100, "orn">;
	def ANDI : ALU_2RI12<0b0000001101, "andi", uimm12>;			def ANDI : ALU_2RI12<0b0000001101, "andi", uimm12>;
	def ORI : ALU_2RI12<0b0000001110, "ori", uimm12>;			def ORI : ALU_2RI12<0b0000001110, "ori", uimm12>;
	def XORI : ALU_2RI12<0b0000001111, "xori", uimm12>;			def XORI : ALU_2RI12<0b0000001111, "xori", uimm12>;
	def MUL_W : ALU_3R<0b00000000000111000, "mul.w">;			def MUL_W : ALU_3R<0b00000000000111000, "mul.w">;
	def MULH_W : ALU_3R<0b00000000000111001, "mulh.w">;			def MULH_W : ALU_3R<0b00000000000111001, "mulh.w">;
	def MULH_WU : ALU_3R<0b00000000000111010, "mulh.wu">;			def MULH_WU : ALU_3R<0b00000000000111010, "mulh.wu">;
				let usesCustomInserter = true in {
	def DIV_W : ALU_3R<0b00000000001000000, "div.w">;			def DIV_W : ALU_3R<0b00000000001000000, "div.w">;
	def MOD_W : ALU_3R<0b00000000001000001, "mod.w">;			def MOD_W : ALU_3R<0b00000000001000001, "mod.w">;
	def DIV_WU : ALU_3R<0b00000000001000010, "div.wu">;			def DIV_WU : ALU_3R<0b00000000001000010, "div.wu">;
	def MOD_WU : ALU_3R<0b00000000001000011, "mod.wu">;			def MOD_WU : ALU_3R<0b00000000001000011, "mod.wu">;
				} // usesCustomInserter = true

	// Bit-shift Instructions			// Bit-shift Instructions
	def SLL_W : ALU_3R<0b00000000000101110, "sll.w">;			def SLL_W : ALU_3R<0b00000000000101110, "sll.w">;
	def SRL_W : ALU_3R<0b00000000000101111, "srl.w">;			def SRL_W : ALU_3R<0b00000000000101111, "srl.w">;
	def SRA_W : ALU_3R<0b00000000000110000, "sra.w">;			def SRA_W : ALU_3R<0b00000000000110000, "sra.w">;
	def ROTR_W : ALU_3R<0b00000000000110110, "rotr.w">;			def ROTR_W : ALU_3R<0b00000000000110110, "rotr.w">;

	def SLLI_W : ALU_2RI5<0b00000000010000001, "slli.w", uimm5>;			def SLLI_W : ALU_2RI5<0b00000000010000001, "slli.w", uimm5>;
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	}			}
	def LU52I_D : ALU_2RI12<0b0000001100, "lu52i.d", simm12>;			def LU52I_D : ALU_2RI12<0b0000001100, "lu52i.d", simm12>;
	def PCADDU18I : ALU_1RI20<0b0001111, "pcaddu18i", simm20>;			def PCADDU18I : ALU_1RI20<0b0001111, "pcaddu18i", simm20>;
	def MUL_D : ALU_3R<0b00000000000111011, "mul.d">;			def MUL_D : ALU_3R<0b00000000000111011, "mul.d">;
	def MULH_D : ALU_3R<0b00000000000111100, "mulh.d">;			def MULH_D : ALU_3R<0b00000000000111100, "mulh.d">;
	def MULH_DU : ALU_3R<0b00000000000111101, "mulh.du">;			def MULH_DU : ALU_3R<0b00000000000111101, "mulh.du">;
	def MULW_D_W : ALU_3R<0b00000000000111110, "mulw.d.w">;			def MULW_D_W : ALU_3R<0b00000000000111110, "mulw.d.w">;
	def MULW_D_WU : ALU_3R<0b00000000000111111, "mulw.d.wu">;			def MULW_D_WU : ALU_3R<0b00000000000111111, "mulw.d.wu">;
				let usesCustomInserter = true in {
	def DIV_D : ALU_3R<0b00000000001000100, "div.d">;			def DIV_D : ALU_3R<0b00000000001000100, "div.d">;
	def MOD_D : ALU_3R<0b00000000001000101, "mod.d">;			def MOD_D : ALU_3R<0b00000000001000101, "mod.d">;
	def DIV_DU : ALU_3R<0b00000000001000110, "div.du">;			def DIV_DU : ALU_3R<0b00000000001000110, "div.du">;
	def MOD_DU : ALU_3R<0b00000000001000111, "mod.du">;			def MOD_DU : ALU_3R<0b00000000001000111, "mod.du">;
				} // usesCustomInserter = true

	// Bit-shift Instructions for 64-bits			// Bit-shift Instructions for 64-bits
	def SLL_D : ALU_3R<0b00000000000110001, "sll.d">;			def SLL_D : ALU_3R<0b00000000000110001, "sll.d">;
	def SRL_D : ALU_3R<0b00000000000110010, "srl.d">;			def SRL_D : ALU_3R<0b00000000000110010, "srl.d">;
	def SRA_D : ALU_3R<0b00000000000110011, "sra.d">;			def SRA_D : ALU_3R<0b00000000000110011, "sra.d">;
	def ROTR_D : ALU_3R<0b00000000000110111, "rotr.d">;			def ROTR_D : ALU_3R<0b00000000000110111, "rotr.d">;
	def SLLI_D : ALU_2RI6<0b0000000001000001, "slli.d", uimm6>;			def SLLI_D : ALU_2RI6<0b0000000001000001, "slli.d", uimm6>;
	def SRLI_D : ALU_2RI6<0b0000000001000101, "srli.d", uimm6>;			def SRLI_D : ALU_2RI6<0b0000000001000101, "srli.d", uimm6>;
	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	class shiftopw<SDPatternOperator operator>			class shiftopw<SDPatternOperator operator>
	: PatFrag<(ops node:$val, node:$count),			: PatFrag<(ops node:$val, node:$count),
	(operator node:$val, (i64 (shiftMask32 node:$count)))>;			(operator node:$val, (i64 (shiftMask32 node:$count)))>;

	let Predicates = [IsLA32] in {			let Predicates = [IsLA32] in {
	def : PatGprGpr<add, ADD_W>;			def : PatGprGpr<add, ADD_W>;
	def : PatGprImm<add, ADDI_W, simm12>;			def : PatGprImm<add, ADDI_W, simm12>;
	def : PatGprGpr<sub, SUB_W>;			def : PatGprGpr<sub, SUB_W>;
				def : PatGprGpr<sdiv, DIV_W>;
				def : PatGprGpr<udiv, DIV_WU>;
				def : PatGprGpr<srem, MOD_W>;
				def : PatGprGpr<urem, MOD_WU>;
	} // Predicates = [IsLA32]			} // Predicates = [IsLA32]

	let Predicates = [IsLA64] in {			let Predicates = [IsLA64] in {
	def : PatGprGpr<add, ADD_D>;			def : PatGprGpr<add, ADD_D>;
	def : PatGprGpr_32<add, ADD_W>;			def : PatGprGpr_32<add, ADD_W>;
	def : PatGprImm<add, ADDI_D, simm12>;			def : PatGprImm<add, ADDI_D, simm12>;
	def : PatGprImm_32<add, ADDI_W, simm12>;			def : PatGprImm_32<add, ADDI_W, simm12>;
	def : PatGprGpr<sub, SUB_D>;			def : PatGprGpr<sub, SUB_D>;
	def : PatGprGpr_32<sub, SUB_W>;			def : PatGprGpr_32<sub, SUB_W>;
				def : PatGprGpr<sdiv, DIV_D>;
				def : PatGprGpr<udiv, DIV_DU>;
				def : PatGprGpr<srem, MOD_D>;
				def : PatGprGpr<urem, MOD_DU>;
	} // Predicates = [IsLA64]			} // Predicates = [IsLA64]

	def : PatGprGpr<and, AND>;			def : PatGprGpr<and, AND>;
	def : PatGprImm<and, ANDI, uimm12>;			def : PatGprImm<and, ANDI, uimm12>;
	def : PatGprGpr<or, OR>;			def : PatGprGpr<or, OR>;
	def : PatGprImm<or, ORI, uimm12>;			def : PatGprImm<or, ORI, uimm12>;
	def : PatGprGpr<xor, XOR>;			def : PatGprGpr<xor, XOR>;
	def : PatGprImm<xor, XORI, uimm12>;			def : PatGprImm<xor, XORI, uimm12>;
	▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/lib/Target/LoongArch/LoongArchMCInstLower.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	case MachineOperand::MO_Immediate:
MCOp = MCOperand::createImm(MO.getImm());		MCOp = MCOperand::createImm(MO.getImm());
break;		break;
case MachineOperand::MO_GlobalAddress:		case MachineOperand::MO_GlobalAddress:
MCOp = lowerSymbolOperand(MO, AP.getSymbolPreferLocal(*MO.getGlobal()), AP);		MCOp = lowerSymbolOperand(MO, AP.getSymbolPreferLocal(*MO.getGlobal()), AP);
break;		break;
case MachineOperand::MO_MachineBasicBlock:		case MachineOperand::MO_MachineBasicBlock:
MCOp = lowerSymbolOperand(MO, MO.getMBB()->getSymbol(), AP);		MCOp = lowerSymbolOperand(MO, MO.getMBB()->getSymbol(), AP);
break;		break;
		case MachineOperand::MO_ExternalSymbol:
		MCOp = lowerSymbolOperand(
		MO, AP.GetExternalSymbolSymbol(MO.getSymbolName()), AP);
		break;
// TODO: lower special operands		// TODO: lower special operands
case MachineOperand::MO_BlockAddress:		case MachineOperand::MO_BlockAddress:
case MachineOperand::MO_ExternalSymbol:
case MachineOperand::MO_ConstantPoolIndex:		case MachineOperand::MO_ConstantPoolIndex:
case MachineOperand::MO_JumpTableIndex:		case MachineOperand::MO_JumpTableIndex:
break;		break;
}		}
return true;		return true;
}		}

bool llvm::lowerLoongArchMachineInstrToMCInst(const MachineInstr *MI,		bool llvm::lowerLoongArchMachineInstrToMCInst(const MachineInstr *MI,
Show All 10 Lines

llvm/test/CodeGen/LoongArch/ir-instruction/sdiv-udiv-srem-urem.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc --mtriple=loongarch32 < %s \| FileCheck %s --check-prefix=LA32
				; RUN: llc --mtriple=loongarch64 < %s \| FileCheck %s --check-prefix=LA64
				; RUN: llc --mtriple=loongarch32 -loongarch-check-zero-division < %s \
				; RUN: \| FileCheck %s --check-prefix=LA32-TRAP
				; RUN: llc --mtriple=loongarch64 -loongarch-check-zero-division < %s \
				; RUN: \| FileCheck %s --check-prefix=LA64-TRAP

				;; Test the sdiv/udiv/srem/urem LLVM IR.

				define i1 @sdiv_i1(i1 %a, i1 %b) {
				; LA32-LABEL: sdiv_i1:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: sdiv_i1:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: sdiv_i1:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: sdiv_i1:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = sdiv i1 %a, %b
				ret i1 %r
				}

				define i8 @sdiv_i8(i8 %a, i8 %b) {
				; LA32-LABEL: sdiv_i8:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: ext.w.b $a1, $a1
				; LA32-NEXT: ext.w.b $a0, $a0
				; LA32-NEXT: div.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: sdiv_i8:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: ext.w.b $a1, $a1
				; LA64-NEXT: ext.w.b $a0, $a0
				; LA64-NEXT: div.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: sdiv_i8:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: ext.w.b $a1, $a1
				; LA32-TRAP-NEXT: ext.w.b $a0, $a0
				; LA32-TRAP-NEXT: div.w $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: sdiv_i8:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: ext.w.b $a1, $a1
				; LA64-TRAP-NEXT: ext.w.b $a0, $a0
				; LA64-TRAP-NEXT: div.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = sdiv i8 %a, %b
				ret i8 %r
				}

				define i16 @sdiv_i16(i16 %a, i16 %b) {
				; LA32-LABEL: sdiv_i16:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: ext.w.h $a1, $a1
				; LA32-NEXT: ext.w.h $a0, $a0
				; LA32-NEXT: div.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: sdiv_i16:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: ext.w.h $a1, $a1
				; LA64-NEXT: ext.w.h $a0, $a0
				; LA64-NEXT: div.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: sdiv_i16:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: ext.w.h $a1, $a1
				; LA32-TRAP-NEXT: ext.w.h $a0, $a0
				; LA32-TRAP-NEXT: div.w $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: sdiv_i16:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: ext.w.h $a1, $a1
				; LA64-TRAP-NEXT: ext.w.h $a0, $a0
				; LA64-TRAP-NEXT: div.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = sdiv i16 %a, %b
				ret i16 %r
				}

				define i32 @sdiv_i32(i32 %a, i32 %b) {
				; LA32-LABEL: sdiv_i32:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: div.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: sdiv_i32:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: addi.w $a1, $a1, 0
				; LA64-NEXT: addi.w $a0, $a0, 0
				; LA64-NEXT: div.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: sdiv_i32:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: div.w $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: sdiv_i32:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: addi.w $a1, $a1, 0
				; LA64-TRAP-NEXT: addi.w $a0, $a0, 0
				; LA64-TRAP-NEXT: div.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				xry111Unsubmitted Not Done Reply Inline Actions It looks suboptimal: "div.w $a0, $a0, $a1" should work so these two sign-extensions are not needed. I'm not sure if it's easy to optimize this. If an optimization is not suitable for this revision, we can do it later. xry111: It looks suboptimal: "div.w $a0, $a0, $a1" should work so these two sign-extensions are not…
				SixWeiningAuthorUnsubmitted Done Reply Inline Actions Yes. 32bit division can be optimized to div.w but we must make sure the inputs are sign extend values. This limitation is marked in Chinese ISA document but not in the English document. Maybe the English version is outdated. define i32 @sdiv_i32(i32 %a, i32 %b) { entry: %r = sdiv i32 %a, %b ret i32 %r } => ; LA64-NOTRAP-NEXT: addi.w $a1, $a1, 0 ; LA64-NOTRAP-NEXT: addi.w $a0, $a0, 0 ; LA64-NOTRAP-NEXT: div.w $a0, $a0, $a1 define i32 @sdiv_i32(i32 signext %a, i32 signext %b) { entry: %r = sdiv i32 %a, %b ret i32 %r } => ; LA64-NOTRAP-NEXT: div.w $a0, $a0, $a1 Since this is an improvement to the codegen, let me implement it with seperate patch in future. Thanks. SixWeining: Yes. 32bit division can be optimized to div.w but we must make sure the inputs are sign extend…
				xen0nUnsubmitted Not Done Reply Inline Actions Indeed; however the limitation seems hugely likely a hardware erratum, because almost any other 32-bit operation on LA64 silently ignores the upper bits. I think it's better to confirm with the hardware team, in case this is another error fixed in the translation. xen0n: Indeed; however the limitation seems hugely likely a hardware erratum, because almost any other…
				xry111Unsubmitted Not Done Reply Inline Actions GCC simply generates a div.w instruction. xry111: GCC simply generates a div.w instruction.
				xry111Unsubmitted Not Done Reply Inline Actions My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because in the ABI parameters are already signed-extended. xry111: My mistake: I saw GCC generated "div.w $a0, $a0, $a1", but it only works because in the ABI…
				entry:
				%r = sdiv i32 %a, %b
				ret i32 %r
				}

				define i64 @sdiv_i64(i64 %a, i64 %b) {
				; LA32-LABEL: sdiv_i64:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: addi.w $sp, $sp, -16
				; LA32-NEXT: .cfi_def_cfa_offset 16
				; LA32-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-NEXT: .cfi_offset 1, -4
				; LA32-NEXT: bl __divdi3
				; LA32-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-NEXT: addi.w $sp, $sp, 16
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: sdiv_i64:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: div.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: sdiv_i64:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: addi.w $sp, $sp, -16
				; LA32-TRAP-NEXT: .cfi_def_cfa_offset 16
				; LA32-TRAP-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-TRAP-NEXT: .cfi_offset 1, -4
				; LA32-TRAP-NEXT: bl __divdi3
				; LA32-TRAP-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-TRAP-NEXT: addi.w $sp, $sp, 16
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: sdiv_i64:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: div.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = sdiv i64 %a, %b
				ret i64 %r
				}

				define i1 @udiv_i1(i1 %a, i1 %b) {
				; LA32-LABEL: udiv_i1:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: udiv_i1:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: udiv_i1:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: udiv_i1:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = udiv i1 %a, %b
				ret i1 %r
				}

				define i8 @udiv_i8(i8 %a, i8 %b) {
				; LA32-LABEL: udiv_i8:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: andi $a1, $a1, 255
				; LA32-NEXT: andi $a0, $a0, 255
				; LA32-NEXT: div.wu $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: udiv_i8:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: andi $a1, $a1, 255
				; LA64-NEXT: andi $a0, $a0, 255
				; LA64-NEXT: div.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: udiv_i8:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: andi $a1, $a1, 255
				; LA32-TRAP-NEXT: andi $a0, $a0, 255
				; LA32-TRAP-NEXT: div.wu $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: udiv_i8:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: andi $a1, $a1, 255
				; LA64-TRAP-NEXT: andi $a0, $a0, 255
				; LA64-TRAP-NEXT: div.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = udiv i8 %a, %b
				ret i8 %r
				}

				define i16 @udiv_i16(i16 %a, i16 %b) {
				; LA32-LABEL: udiv_i16:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: bstrpick.w $a1, $a1, 15, 0
				; LA32-NEXT: bstrpick.w $a0, $a0, 15, 0
				; LA32-NEXT: div.wu $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: udiv_i16:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: bstrpick.d $a1, $a1, 15, 0
				; LA64-NEXT: bstrpick.d $a0, $a0, 15, 0
				; LA64-NEXT: div.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: udiv_i16:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: bstrpick.w $a1, $a1, 15, 0
				; LA32-TRAP-NEXT: bstrpick.w $a0, $a0, 15, 0
				; LA32-TRAP-NEXT: div.wu $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: udiv_i16:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: bstrpick.d $a1, $a1, 15, 0
				; LA64-TRAP-NEXT: bstrpick.d $a0, $a0, 15, 0
				; LA64-TRAP-NEXT: div.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = udiv i16 %a, %b
				ret i16 %r
				}

				define i32 @udiv_i32(i32 %a, i32 %b) {
				; LA32-LABEL: udiv_i32:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: div.wu $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: udiv_i32:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: bstrpick.d $a1, $a1, 31, 0
				; LA64-NEXT: bstrpick.d $a0, $a0, 31, 0
				; LA64-NEXT: div.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: udiv_i32:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: div.wu $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: udiv_i32:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: bstrpick.d $a1, $a1, 31, 0
				; LA64-TRAP-NEXT: bstrpick.d $a0, $a0, 31, 0
				; LA64-TRAP-NEXT: div.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = udiv i32 %a, %b
				ret i32 %r
				}

				define i64 @udiv_i64(i64 %a, i64 %b) {
				; LA32-LABEL: udiv_i64:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: addi.w $sp, $sp, -16
				; LA32-NEXT: .cfi_def_cfa_offset 16
				; LA32-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-NEXT: .cfi_offset 1, -4
				; LA32-NEXT: bl __udivdi3
				; LA32-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-NEXT: addi.w $sp, $sp, 16
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: udiv_i64:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: div.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: udiv_i64:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: addi.w $sp, $sp, -16
				; LA32-TRAP-NEXT: .cfi_def_cfa_offset 16
				; LA32-TRAP-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-TRAP-NEXT: .cfi_offset 1, -4
				; LA32-TRAP-NEXT: bl __udivdi3
				; LA32-TRAP-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-TRAP-NEXT: addi.w $sp, $sp, 16
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: udiv_i64:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: div.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = udiv i64 %a, %b
				ret i64 %r
				}

				define i1 @srem_i1(i1 %a, i1 %b) {
				; LA32-LABEL: srem_i1:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: move $a0, $zero
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: srem_i1:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: move $a0, $zero
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: srem_i1:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: move $a0, $zero
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: srem_i1:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: move $a0, $zero
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = srem i1 %a, %b
				ret i1 %r
				}

				define i8 @srem_i8(i8 %a, i8 %b) {
				; LA32-LABEL: srem_i8:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: ext.w.b $a1, $a1
				; LA32-NEXT: ext.w.b $a0, $a0
				; LA32-NEXT: mod.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: srem_i8:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: ext.w.b $a1, $a1
				; LA64-NEXT: ext.w.b $a0, $a0
				; LA64-NEXT: mod.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: srem_i8:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: ext.w.b $a1, $a1
				; LA32-TRAP-NEXT: ext.w.b $a0, $a0
				; LA32-TRAP-NEXT: mod.w $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: srem_i8:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: ext.w.b $a1, $a1
				; LA64-TRAP-NEXT: ext.w.b $a0, $a0
				; LA64-TRAP-NEXT: mod.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = srem i8 %a, %b
				ret i8 %r
				}

				define i16 @srem_i16(i16 %a, i16 %b) {
				; LA32-LABEL: srem_i16:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: ext.w.h $a1, $a1
				; LA32-NEXT: ext.w.h $a0, $a0
				; LA32-NEXT: mod.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: srem_i16:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: ext.w.h $a1, $a1
				; LA64-NEXT: ext.w.h $a0, $a0
				; LA64-NEXT: mod.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: srem_i16:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: ext.w.h $a1, $a1
				; LA32-TRAP-NEXT: ext.w.h $a0, $a0
				; LA32-TRAP-NEXT: mod.w $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: srem_i16:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: ext.w.h $a1, $a1
				; LA64-TRAP-NEXT: ext.w.h $a0, $a0
				; LA64-TRAP-NEXT: mod.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = srem i16 %a, %b
				ret i16 %r
				}

				define i32 @srem_i32(i32 %a, i32 %b) {
				; LA32-LABEL: srem_i32:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: mod.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: srem_i32:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: addi.w $a1, $a1, 0
				; LA64-NEXT: addi.w $a0, $a0, 0
				; LA64-NEXT: mod.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: srem_i32:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: mod.w $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: srem_i32:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: addi.w $a1, $a1, 0
				; LA64-TRAP-NEXT: addi.w $a0, $a0, 0
				; LA64-TRAP-NEXT: mod.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = srem i32 %a, %b
				ret i32 %r
				}

				define i64 @srem_i64(i64 %a, i64 %b) {
				; LA32-LABEL: srem_i64:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: addi.w $sp, $sp, -16
				; LA32-NEXT: .cfi_def_cfa_offset 16
				; LA32-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-NEXT: .cfi_offset 1, -4
				; LA32-NEXT: bl __moddi3
				; LA32-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-NEXT: addi.w $sp, $sp, 16
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: srem_i64:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: mod.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: srem_i64:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: addi.w $sp, $sp, -16
				; LA32-TRAP-NEXT: .cfi_def_cfa_offset 16
				; LA32-TRAP-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-TRAP-NEXT: .cfi_offset 1, -4
				; LA32-TRAP-NEXT: bl __moddi3
				; LA32-TRAP-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-TRAP-NEXT: addi.w $sp, $sp, 16
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: srem_i64:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: mod.d $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = srem i64 %a, %b
				ret i64 %r
				}

				define i1 @urem_i1(i1 %a, i1 %b) {
				; LA32-LABEL: urem_i1:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: move $a0, $zero
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: urem_i1:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: move $a0, $zero
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: urem_i1:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: move $a0, $zero
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: urem_i1:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: move $a0, $zero
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = urem i1 %a, %b
				ret i1 %r
				}

				define i8 @urem_i8(i8 %a, i8 %b) {
				; LA32-LABEL: urem_i8:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: andi $a1, $a1, 255
				; LA32-NEXT: andi $a0, $a0, 255
				; LA32-NEXT: mod.wu $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: urem_i8:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: andi $a1, $a1, 255
				; LA64-NEXT: andi $a0, $a0, 255
				; LA64-NEXT: mod.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: urem_i8:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: andi $a1, $a1, 255
				; LA32-TRAP-NEXT: andi $a0, $a0, 255
				; LA32-TRAP-NEXT: mod.wu $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: urem_i8:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: andi $a1, $a1, 255
				; LA64-TRAP-NEXT: andi $a0, $a0, 255
				; LA64-TRAP-NEXT: mod.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = urem i8 %a, %b
				ret i8 %r
				}

				define i16 @urem_i16(i16 %a, i16 %b) {
				; LA32-LABEL: urem_i16:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: bstrpick.w $a1, $a1, 15, 0
				; LA32-NEXT: bstrpick.w $a0, $a0, 15, 0
				; LA32-NEXT: mod.wu $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: urem_i16:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: bstrpick.d $a1, $a1, 15, 0
				; LA64-NEXT: bstrpick.d $a0, $a0, 15, 0
				; LA64-NEXT: mod.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: urem_i16:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: bstrpick.w $a1, $a1, 15, 0
				; LA32-TRAP-NEXT: bstrpick.w $a0, $a0, 15, 0
				; LA32-TRAP-NEXT: mod.wu $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: urem_i16:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: bstrpick.d $a1, $a1, 15, 0
				; LA64-TRAP-NEXT: bstrpick.d $a0, $a0, 15, 0
				; LA64-TRAP-NEXT: mod.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = urem i16 %a, %b
				ret i16 %r
				}

				define i32 @urem_i32(i32 %a, i32 %b) {
				; LA32-LABEL: urem_i32:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: mod.wu $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: urem_i32:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: bstrpick.d $a1, $a1, 31, 0
				; LA64-NEXT: bstrpick.d $a0, $a0, 31, 0
				; LA64-NEXT: mod.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: urem_i32:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: mod.wu $a0, $a0, $a1
				; LA32-TRAP-NEXT: bnez $a1, 8
				; LA32-TRAP-NEXT: break 7
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: urem_i32:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: bstrpick.d $a1, $a1, 31, 0
				; LA64-TRAP-NEXT: bstrpick.d $a0, $a0, 31, 0
				; LA64-TRAP-NEXT: mod.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = urem i32 %a, %b
				ret i32 %r
				}

				define i64 @urem_i64(i64 %a, i64 %b) {
				; LA32-LABEL: urem_i64:
				; LA32: # %bb.0: # %entry
				; LA32-NEXT: addi.w $sp, $sp, -16
				; LA32-NEXT: .cfi_def_cfa_offset 16
				; LA32-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-NEXT: .cfi_offset 1, -4
				; LA32-NEXT: bl __umoddi3
				; LA32-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-NEXT: addi.w $sp, $sp, 16
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: urem_i64:
				; LA64: # %bb.0: # %entry
				; LA64-NEXT: mod.du $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				;
				; LA32-TRAP-LABEL: urem_i64:
				; LA32-TRAP: # %bb.0: # %entry
				; LA32-TRAP-NEXT: addi.w $sp, $sp, -16
				; LA32-TRAP-NEXT: .cfi_def_cfa_offset 16
				; LA32-TRAP-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
				; LA32-TRAP-NEXT: .cfi_offset 1, -4
				; LA32-TRAP-NEXT: bl __umoddi3
				; LA32-TRAP-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
				; LA32-TRAP-NEXT: addi.w $sp, $sp, 16
				; LA32-TRAP-NEXT: jirl $zero, $ra, 0
				;
				; LA64-TRAP-LABEL: urem_i64:
				; LA64-TRAP: # %bb.0: # %entry
				; LA64-TRAP-NEXT: mod.du $a0, $a0, $a1
				; LA64-TRAP-NEXT: bnez $a1, 8
				; LA64-TRAP-NEXT: break 7
				; LA64-TRAP-NEXT: jirl $zero, $ra, 0
				entry:
				%r = urem i64 %a, %b
				ret i64 %r
				}