This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/LoongArch/
-
Target/
-
LoongArch/
-
LoongArchFloat32InstrInfo.td
-
LoongArchFloat64InstrInfo.td
-
LoongArchISelLowering.h
2/4
LoongArchISelLowering.cpp
-
LoongArchInstrInfo.td
-
LoongArchMCInstLower.cpp
-
test/CodeGen/LoongArch/ir-instruction/
-
CodeGen/
-
LoongArch/
-
ir-instruction/
1/8
load-store.ll

Differential D128427

[LoongArch] Add codegen support for load/store operations
ClosedPublic

Authored by wangleiat on Jun 23 2022, 5:04 AM.

Download Raw Diff

Details

Reviewers

SixWeining
MaskRay
xen0n
xry111

Commits

rG5b4851ed9113: [LoongArch] Add codegen support for load/store operations

Summary

This patch also support lowering global addresses.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,100 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,120 ms	x64 debian > LLVM.CodeGen/NVPTX::wmma.py

Event Timeline

wangleiat created this revision.Jun 23 2022, 5:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 5:04 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

wangleiat requested review of this revision.Jun 23 2022, 5:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 5:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wangleiat added a child revision: D128428: [LoongArch] Add codegen support for conditional branches.Jun 23 2022, 5:10 AM

Harbormaster completed remote builds in B171569: Diff 439338.Jun 23 2022, 5:55 AM

xen0n added inline comments.Jun 23 2022, 8:58 AM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
123	Because `pcalau12i` always produces results with the lower 12 bits clean, isn't `ori` more suitable for concatenating the lower bits?

wangleiat added inline comments.Jun 23 2022, 8:28 PM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
123	The linker will corrects the `pcalau12i` instruction based on whether bit 12 is 1. The advantage of this is that there is a chance to combine the `addi.{d,w}` instruction with the load/store instruction. example: pcalau12i $a1, %pc_hi20(G) addi.d $a2, $a1, %pc_lo12(G) ld.w $a1, $a2, 0 => pcalau128 $a1, %pc_hi20(G) ld.w $a2, $a1, %pc_lo12(G)

MaskRay added inline comments.Jun 26 2022, 4:52 PM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
29	Switch to opaque pointers.

MaskRay added inline comments.Jun 26 2022, 7:12 PM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
122	`SDValue AddrHi(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0);`

Update test to use opaque pointers

wangleiat added inline comments.Jun 26 2022, 8:50 PM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
122	thanks!
llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
29	thanks!

Harbormaster completed remote builds in B172115: Diff 440102.Jun 26 2022, 9:33 PM

PING
Do you have any comments on this series of patches?

While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.

Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.

This revision is now accepted and ready to land.Jul 1 2022, 6:37 AM

In D128427#3624889, @xen0n wrote:

While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.

Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.

Thanks!
The large offset situation can be covered in previous patches, offsets will be converted to immediate loads.

xry111 added inline comments.Jul 1 2022, 7:48 PM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330	Based on the previous discussion, should we move "ori" to the last instruction in the long immediate load sequence and change it to "addi.d" if possible, so a peephole optimization would be able to combine "addi.d" and "ld" into one instruction?

SixWeining added inline comments.Jul 1 2022, 8:51 PM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330	In terms of principle this can be implemented. But seems that this is only suitable for very few scenarios (maybe only when accessing a constant memory address). I'm not very sure.

In D128427#3626359, @wangleiat wrote:

In D128427#3624889, @xen0n wrote:

While I'm not very sure about the new pcalau12i + addi symbol address materialization, they look good regardless, so are the other test cases.

Do you think a test exercising very large offset for getelementptr is worthwhile? LGTM otherwise.

Thanks!
The large offset situation can be covered in previous patches, offsets will be converted to immediate loads.

Okay. No need to additionally cover it here then.

xen0n added inline comments.Jul 2 2022, 12:58 AM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330	I think this depends on the micro-architecture. If `lu12i.w; ori; lu32i.d; lu52i.d` sequences or variations of it are macro-op-fusioned, then breaking away from the current pattern could instead harm performance. OTOH it could be a net benefit but as @SixWeining pointed out this case of accessing absolute addresses is probably too rare to justify a special-case.

xry111 added inline comments.Jul 2 2022, 4:01 AM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330	Then I'll withdraw the proposal, at least for now.

xry111 added inline comments.Jul 2 2022, 6:42 PM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330	I tested some very simple cases on a 3A5000LL. It seems you can move `ori` after `lu32i.d` or `lu52i.d`, or change it to `addi.d` or even `xori` w/o performance loss. Maybe my case is too noob and can't reflect the performance of real apps though.

xen0n added inline comments.Jul 2 2022, 8:08 PM

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll
330	Note that I don't know whether there's macro-op fusion of this sort on 3A5000 in the first place; if this fusion actually doesn't happen on 3A5000 then there should be no performance loss whatsoever. We need input from the Loongson hardware team to be sure.

This revision was landed with ongoing or failed builds.Jul 4 2022, 8:59 PM

Closed by commit rG5b4851ed9113: [LoongArch] Add codegen support for load/store operations (authored by wangleiat, committed by SixWeining). · Explain Why

This revision was automatically updated to reflect the committed changes.

SixWeining added a commit: rG5b4851ed9113: [LoongArch] Add codegen support for load/store operations.

MaskRay mentioned this in D129977: [LoongArch] Support load/store of dso_local PIC global values.Jul 17 2022, 5:17 PM

MaskRay mentioned this in rG9742166935f4: [LoongArch] Support load/store of dso_local PIC global values.Jul 21 2022, 7:38 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

LoongArch/

LoongArchFloat32InstrInfo.td

8 lines

LoongArchFloat64InstrInfo.td

8 lines

LoongArchISelLowering.h

1 line

LoongArchISelLowering.cpp

27 lines

LoongArchInstrInfo.td

40 lines

LoongArchMCInstLower.cpp

20 lines

test/

CodeGen/

LoongArch/

ir-instruction/

load-store.ll

368 lines

Diff 439338

llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	def : PatFPSelectcc<SETONE, FCMP_CNE_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETONE, FCMP_CNE_S, FSEL_S, FPR32>;
	def : PatFPSelectcc<SETO, FCMP_COR_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETO, FCMP_COR_S, FSEL_S, FPR32>;
	def : PatFPSelectcc<SETUEQ, FCMP_CUEQ_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETUEQ, FCMP_CUEQ_S, FSEL_S, FPR32>;
	def : PatFPSelectcc<SETULT, FCMP_CULT_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETULT, FCMP_CULT_S, FSEL_S, FPR32>;
	def : PatFPSelectcc<SETULE, FCMP_CULE_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETULE, FCMP_CULE_S, FSEL_S, FPR32>;
	def : PatFPSelectcc<SETUNE, FCMP_CUNE_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETUNE, FCMP_CUNE_S, FSEL_S, FPR32>;
	def : PatFPSelectcc<SETUO, FCMP_CUN_S, FSEL_S, FPR32>;			def : PatFPSelectcc<SETUO, FCMP_CUN_S, FSEL_S, FPR32>;

				/// Loads

				defm : LdPat<load, FLD_S, f32>;

				/// Stores

				defm : StPat<store, FST_S, FPR32, f32>;

	} // Predicates = [HasBasicF]			} // Predicates = [HasBasicF]

llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td

	Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
	def : PatFPSelectcc<SETONE, FCMP_CNE_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETONE, FCMP_CNE_D, FSEL_D, FPR64>;
	def : PatFPSelectcc<SETO, FCMP_COR_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETO, FCMP_COR_D, FSEL_D, FPR64>;
	def : PatFPSelectcc<SETUEQ, FCMP_CUEQ_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETUEQ, FCMP_CUEQ_D, FSEL_D, FPR64>;
	def : PatFPSelectcc<SETULT, FCMP_CULT_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETULT, FCMP_CULT_D, FSEL_D, FPR64>;
	def : PatFPSelectcc<SETULE, FCMP_CULE_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETULE, FCMP_CULE_D, FSEL_D, FPR64>;
	def : PatFPSelectcc<SETUNE, FCMP_CUNE_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETUNE, FCMP_CUNE_D, FSEL_D, FPR64>;
	def : PatFPSelectcc<SETUO, FCMP_CUN_D, FSEL_D, FPR64>;			def : PatFPSelectcc<SETUO, FCMP_CUN_D, FSEL_D, FPR64>;

				/// Loads

				defm : LdPat<load, FLD_D, f64>;

				/// Stores

				defm : StPat<store, FST_D, FPR64, f64>;

	} // Predicates = [HasBasicD]			} // Predicates = [HasBasicD]

llvm/lib/Target/LoongArch/LoongArchISelLowering.h

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	private:

void analyzeInputArgs(CCState &CCInfo,		void analyzeInputArgs(CCState &CCInfo,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
LoongArchCCAssignFn Fn) const;		LoongArchCCAssignFn Fn) const;
void analyzeOutputArgs(CCState &CCInfo,		void analyzeOutputArgs(CCState &CCInfo,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
LoongArchCCAssignFn Fn) const;		LoongArchCCAssignFn Fn) const;

		SDValue lowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerShiftRightParts(SDValue Op, SelectionDAG &DAG, bool IsSRA) const;		SDValue lowerShiftRightParts(SDValue Op, SelectionDAG &DAG, bool IsSRA) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_LOONGARCH_LOONGARCHISELLOWERING_H		#endif // LLVM_LIB_TARGET_LOONGARCH_LOONGARCHISELLOWERING_H

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "LoongArchISelLowering.h"		#include "LoongArchISelLowering.h"
#include "LoongArch.h"		#include "LoongArch.h"
#include "LoongArchMachineFunctionInfo.h"		#include "LoongArchMachineFunctionInfo.h"
#include "LoongArchRegisterInfo.h"		#include "LoongArchRegisterInfo.h"
#include "LoongArchSubtarget.h"		#include "LoongArchSubtarget.h"
#include "LoongArchTargetMachine.h"		#include "LoongArchTargetMachine.h"
		#include "MCTargetDesc/LoongArchMCTargetDesc.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loongarch-isel-lowering"		#define DEBUG_TYPE "loongarch-isel-lowering"

LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,		LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
const LoongArchSubtarget &STI)		const LoongArchSubtarget &STI)
: TargetLowering(TM), Subtarget(STI) {		: TargetLowering(TM), Subtarget(STI) {

MVT GRLenVT = Subtarget.getGRLenVT();		MVT GRLenVT = Subtarget.getGRLenVT();
// Set up the register classes.		// Set up the register classes.
addRegisterClass(GRLenVT, &LoongArch::GPRRegClass);		addRegisterClass(GRLenVT, &LoongArch::GPRRegClass);
if (Subtarget.hasBasicF())		if (Subtarget.hasBasicF())
addRegisterClass(MVT::f32, &LoongArch::FPR32RegClass);		addRegisterClass(MVT::f32, &LoongArch::FPR32RegClass);
if (Subtarget.hasBasicD())		if (Subtarget.hasBasicD())
addRegisterClass(MVT::f64, &LoongArch::FPR64RegClass);		addRegisterClass(MVT::f64, &LoongArch::FPR64RegClass);

		setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, GRLenVT,
		MVT::i1, Promote);

// TODO: add necessary setOperationAction calls later.		// TODO: add necessary setOperationAction calls later.
setOperationAction(ISD::SHL_PARTS, GRLenVT, Custom);		setOperationAction(ISD::SHL_PARTS, GRLenVT, Custom);
setOperationAction(ISD::SRA_PARTS, GRLenVT, Custom);		setOperationAction(ISD::SRA_PARTS, GRLenVT, Custom);
setOperationAction(ISD::SRL_PARTS, GRLenVT, Custom);		setOperationAction(ISD::SRL_PARTS, GRLenVT, Custom);

		setOperationAction(ISD::GlobalAddress, GRLenVT, Custom);

if (Subtarget.is64Bit()) {		if (Subtarget.is64Bit()) {
setOperationAction(ISD::SHL, MVT::i32, Custom);		setOperationAction(ISD::SHL, MVT::i32, Custom);
setOperationAction(ISD::SRA, MVT::i32, Custom);		setOperationAction(ISD::SRA, MVT::i32, Custom);
setOperationAction(ISD::SRL, MVT::i32, Custom);		setOperationAction(ISD::SRL, MVT::i32, Custom);
}		}

static const ISD::CondCode FPCCToExpand[] = {ISD::SETOGT, ISD::SETOGE,		static const ISD::CondCode FPCCToExpand[] = {ISD::SETOGT, ISD::SETOGE,
ISD::SETUGT, ISD::SETUGE};		ISD::SETUGT, ISD::SETUGE};
Show All 25 Lines	LoongArchTargetLowering::LoongArchTargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::SRL);		setTargetDAGCombine(ISD::SRL);
}		}

SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,		SDValue LoongArchTargetLowering::LowerOperation(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
report_fatal_error("unimplemented operand");		report_fatal_error("unimplemented operand");
		case ISD::GlobalAddress:
		return lowerGlobalAddress(Op, DAG);
case ISD::SHL_PARTS:		case ISD::SHL_PARTS:
return lowerShiftLeftParts(Op, DAG);		return lowerShiftLeftParts(Op, DAG);
case ISD::SRA_PARTS:		case ISD::SRA_PARTS:
return lowerShiftRightParts(Op, DAG, true);		return lowerShiftRightParts(Op, DAG, true);
case ISD::SRL_PARTS:		case ISD::SRL_PARTS:
return lowerShiftRightParts(Op, DAG, false);		return lowerShiftRightParts(Op, DAG, false);
case ISD::SHL:		case ISD::SHL:
case ISD::SRA:		case ISD::SRA:
case ISD::SRL:		case ISD::SRL:
// This can be called for an i32 shift amount that needs to be promoted.		// This can be called for an i32 shift amount that needs to be promoted.
assert(Op.getOperand(1).getValueType() == MVT::i32 && Subtarget.is64Bit() &&		assert(Op.getOperand(1).getValueType() == MVT::i32 && Subtarget.is64Bit() &&
"Unexpected custom legalisation");		"Unexpected custom legalisation");
return SDValue();		return SDValue();
}		}
}		}

		SDValue LoongArchTargetLowering::lowerGlobalAddress(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		EVT Ty = getPointerTy(DAG.getDataLayout());
		const GlobalValue *GV = cast<GlobalAddressSDNode>(Op)->getGlobal();
		unsigned ADDIOp = Subtarget.is64Bit() ? LoongArch::ADDI_D : LoongArch::ADDI_W;

		// FIXME: Only support PC-relative addressing to access the symbol.
		// TODO: Add target flags.
		if (!isPositionIndependent()) {
		SDValue GA = DAG.getTargetGlobalAddress(GV, DL, Ty);
		SDValue AddrHi =
		SDValue(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0);
		MaskRayUnsubmitted Not Done Reply Inline Actions `SDValue AddrHi(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0);` MaskRay: `SDValue AddrHi(DAG.getMachineNode(LoongArch::PCALAU12I, DL, Ty, GA), 0);`
		wangleiatAuthorUnsubmitted Done Reply Inline Actions thanks! wangleiat: thanks!
		SDValue Addr = SDValue(DAG.getMachineNode(ADDIOp, DL, Ty, AddrHi, GA), 0);
		xen0nUnsubmitted Not Done Reply Inline Actions Because `pcalau12i` always produces results with the lower 12 bits clean, isn't `ori` more suitable for concatenating the lower bits? xen0n: Because `pcalau12i` always produces results with the lower 12 bits clean, isn't `ori` more…
		wangleiatAuthorUnsubmitted Done Reply Inline Actions The linker will corrects the `pcalau12i` instruction based on whether bit 12 is 1. The advantage of this is that there is a chance to combine the `addi.{d,w}` instruction with the load/store instruction. example: pcalau12i $a1, %pc_hi20(G) addi.d $a2, $a1, %pc_lo12(G) ld.w $a1, $a2, 0 => pcalau128 $a1, %pc_hi20(G) ld.w $a2, $a1, %pc_lo12(G) wangleiat: The linker will corrects the `pcalau12i` instruction based on whether bit 12 is 1. The…
		return Addr;
		}
		report_fatal_error("Unable to lowerGlobalAddress");
		}

SDValue LoongArchTargetLowering::lowerShiftLeftParts(SDValue Op,		SDValue LoongArchTargetLowering::lowerShiftLeftParts(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
SDValue Lo = Op.getOperand(0);		SDValue Lo = Op.getOperand(0);
SDValue Hi = Op.getOperand(1);		SDValue Hi = Op.getOperand(1);
SDValue Shamt = Op.getOperand(2);		SDValue Shamt = Op.getOperand(2);
EVT VT = Lo.getValueType();		EVT VT = Lo.getValueType();

▲ Show 20 Lines • Show All 422 Lines • Show Last 20 Lines

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td

	Show First 20 Lines • Show All 657 Lines • ▼ Show 20 Lines
	let Predicates = [IsLA32] in			let Predicates = [IsLA32] in
	def : Pat<(loongarch_bstrpick GPR:$rj, uimm5:$msbd, uimm5:$lsbd),			def : Pat<(loongarch_bstrpick GPR:$rj, uimm5:$msbd, uimm5:$lsbd),
	(BSTRPICK_W GPR:$rj, uimm5:$msbd, uimm5:$lsbd)>;			(BSTRPICK_W GPR:$rj, uimm5:$msbd, uimm5:$lsbd)>;

	let Predicates = [IsLA64] in			let Predicates = [IsLA64] in
	def : Pat<(loongarch_bstrpick GPR:$rj, uimm6:$msbd, uimm6:$lsbd),			def : Pat<(loongarch_bstrpick GPR:$rj, uimm6:$msbd, uimm6:$lsbd),
	(BSTRPICK_D GPR:$rj, uimm6:$msbd, uimm6:$lsbd)>;			(BSTRPICK_D GPR:$rj, uimm6:$msbd, uimm6:$lsbd)>;

				/// Loads

				multiclass LdPat<PatFrag LoadOp, LAInst Inst, ValueType vt = GRLenVT> {
				def : Pat<(vt (LoadOp GPR:$rj)), (Inst GPR:$rj, 0)>;
				def : Pat<(vt (LoadOp (add GPR:$rj, simm12:$imm12))),
				(Inst GPR:$rj, simm12:$imm12)>;
				}

				defm : LdPat<sextloadi8, LD_B>;
				defm : LdPat<extloadi8, LD_B>;
				defm : LdPat<sextloadi16, LD_H>;
				defm : LdPat<extloadi16, LD_H>;
				defm : LdPat<load, LD_W>, Requires<[IsLA32]>;
				defm : LdPat<zextloadi8, LD_BU>;
				defm : LdPat<zextloadi16, LD_HU>;
				let Predicates = [IsLA64] in {
				defm : LdPat<sextloadi32, LD_W, i64>;
				defm : LdPat<extloadi32, LD_W, i64>;
				defm : LdPat<zextloadi32, LD_WU, i64>;
				defm : LdPat<load, LD_D, i64>;
				} // Predicates = [IsLA64]

				/// Stores

				multiclass StPat<PatFrag StoreOp, LAInst Inst, RegisterClass StTy,
				ValueType vt> {
				def : Pat<(StoreOp (vt StTy:$rd), GPR:$rj),
				(Inst StTy:$rd, GPR:$rj, 0)>;
				def : Pat<(StoreOp (vt StTy:$rd), (add GPR:$rj, simm12:$imm12)),
				(Inst StTy:$rd, GPR:$rj, simm12:$imm12)>;
				}

				defm : StPat<truncstorei8, ST_B, GPR, GRLenVT>;
				defm : StPat<truncstorei16, ST_H, GPR, GRLenVT>;
				defm : StPat<store, ST_W, GPR, i32>, Requires<[IsLA32]>;
				let Predicates = [IsLA64] in {
				defm : StPat<truncstorei32, ST_W, GPR, i64>;
				defm : StPat<store, ST_D, GPR, i64>;
				} // Predicates = [IsLA64]

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Assembler Pseudo Instructions			// Assembler Pseudo Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : InstAlias<"nop", (ANDI R0, R0, 0)>;			def : InstAlias<"nop", (ANDI R0, R0, 0)>;
	def : InstAlias<"move $dst, $src", (OR GPR:$dst, GPR:$src, R0)>;			def : InstAlias<"move $dst, $src", (OR GPR:$dst, GPR:$src, R0)>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/lib/Target/LoongArch/LoongArchMCInstLower.cpp

	Show All 16 Lines
	#include "llvm/CodeGen/MachineBasicBlock.h"			#include "llvm/CodeGen/MachineBasicBlock.h"
	#include "llvm/CodeGen/MachineInstr.h"			#include "llvm/CodeGen/MachineInstr.h"
	#include "llvm/MC/MCAsmInfo.h"			#include "llvm/MC/MCAsmInfo.h"
	#include "llvm/MC/MCContext.h"			#include "llvm/MC/MCContext.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	using namespace llvm;			using namespace llvm;

				static MCOperand lowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym,
				const AsmPrinter &AP) {
				MCContext &Ctx = AP.OutContext;

				// TODO: Processing target flags.

				const MCExpr *ME =
				MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, Ctx);

				if (!MO.isJTI() && !MO.isMBB() && MO.getOffset())
				ME = MCBinaryExpr::createAdd(
				ME, MCConstantExpr::create(MO.getOffset(), Ctx), Ctx);

				return MCOperand::createExpr(ME);
				}

	bool llvm::lowerLoongArchMachineOperandToMCOperand(const MachineOperand &MO,			bool llvm::lowerLoongArchMachineOperandToMCOperand(const MachineOperand &MO,
	MCOperand &MCOp,			MCOperand &MCOp,
	const AsmPrinter &AP) {			const AsmPrinter &AP) {
	switch (MO.getType()) {			switch (MO.getType()) {
	default:			default:
	report_fatal_error(			report_fatal_error(
	"lowerLoongArchMachineOperandToMCOperand: unknown operand type");			"lowerLoongArchMachineOperandToMCOperand: unknown operand type");
	case MachineOperand::MO_Register:			case MachineOperand::MO_Register:
	// Ignore all implicit register operands.			// Ignore all implicit register operands.
	if (MO.isImplicit())			if (MO.isImplicit())
	return false;			return false;
	MCOp = MCOperand::createReg(MO.getReg());			MCOp = MCOperand::createReg(MO.getReg());
	break;			break;
	case MachineOperand::MO_RegisterMask:			case MachineOperand::MO_RegisterMask:
	// Regmasks are like implicit defs.			// Regmasks are like implicit defs.
	return false;			return false;
	case MachineOperand::MO_Immediate:			case MachineOperand::MO_Immediate:
	MCOp = MCOperand::createImm(MO.getImm());			MCOp = MCOperand::createImm(MO.getImm());
	break;			break;
				case MachineOperand::MO_GlobalAddress:
				MCOp = lowerSymbolOperand(MO, AP.getSymbolPreferLocal(*MO.getGlobal()), AP);
				break;
	// TODO: lower special operands			// TODO: lower special operands
	case MachineOperand::MO_MachineBasicBlock:			case MachineOperand::MO_MachineBasicBlock:
	case MachineOperand::MO_GlobalAddress:
	case MachineOperand::MO_BlockAddress:			case MachineOperand::MO_BlockAddress:
	case MachineOperand::MO_ExternalSymbol:			case MachineOperand::MO_ExternalSymbol:
	case MachineOperand::MO_ConstantPoolIndex:			case MachineOperand::MO_ConstantPoolIndex:
	case MachineOperand::MO_JumpTableIndex:			case MachineOperand::MO_JumpTableIndex:
	break;			break;
	}			}
	return true;			return true;
	}			}
	Show All 12 Lines

llvm/test/CodeGen/LoongArch/ir-instruction/load-store.ll

This file was added.

				; RUN: llc --mtriple=loongarch32 --mattr=+d < %s \| FileCheck %s --check-prefixes=ALL,LA32
				; RUN: llc --mtriple=loongarch64 --mattr=+d < %s \| FileCheck %s --check-prefixes=ALL,LA64

				;; Check load from and store to a global mem.
				@G = global i32 0

				define i32 @load_store_global(i32 %a) nounwind {
				; LA32-LABEL: load_store_global:
				; LA32: # %bb.0:
				; LA32-NEXT: pcalau12i $a1, G
				; LA32-NEXT: addi.w $a2, $a1, G
				; LA32-NEXT: ld.w $a1, $a2, 0
				; LA32-NEXT: st.w $a0, $a2, 0
				; LA32-NEXT: ld.w $a3, $a2, 36
				; LA32-NEXT: st.w $a0, $a2, 36
				; LA32-NEXT: move $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: load_store_global:
				; LA64: # %bb.0:
				; LA64-NEXT: pcalau12i $a1, G
				; LA64-NEXT: addi.d $a2, $a1, G
				; LA64-NEXT: ld.w $a1, $a2, 0
				; LA64-NEXT: st.w $a0, $a2, 0
				; LA64-NEXT: ld.w $a3, $a2, 36
				; LA64-NEXT: st.w $a0, $a2, 36
				; LA64-NEXT: move $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = load volatile i32, i32* @G
				MaskRayUnsubmitted Not Done Reply Inline Actions Switch to opaque pointers. MaskRay: Switch to opaque pointers.
				wangleiatAuthorUnsubmitted Done Reply Inline Actions thanks! wangleiat: thanks!
				store i32 %a, i32* @G
				%2 = getelementptr i32, i32* @G, i32 9
				%3 = load volatile i32, i32* %2
				store i32 %a, i32* %2
				ret i32 %1
				}

				;; Check indexed and unindexed, sext, zext and anyext loads.

				define i64 @ld_b(i8 *%a) nounwind {
				; LA32-LABEL: ld_b:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.b $a1, $a0, 0
				; LA32-NEXT: ld.b $a0, $a0, 1
				; LA32-NEXT: srai.w $a1, $a0, 31
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_b:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.b $a1, $a0, 0
				; LA64-NEXT: ld.b $a0, $a0, 1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i8, i8* %a, i64 1
				%2 = load i8, i8* %1
				%3 = sext i8 %2 to i64
				%4 = load volatile i8, i8* %a
				ret i64 %3
				}

				define i64 @ld_h(i16 *%a) nounwind {
				; LA32-LABEL: ld_h:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.h $a1, $a0, 0
				; LA32-NEXT: ld.h $a0, $a0, 4
				; LA32-NEXT: srai.w $a1, $a0, 31
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_h:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.h $a1, $a0, 0
				; LA64-NEXT: ld.h $a0, $a0, 4
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i16, i16* %a, i64 2
				%2 = load i16, i16* %1
				%3 = sext i16 %2 to i64
				%4 = load volatile i16, i16* %a
				ret i64 %3
				}

				define i64 @ld_w(i32 *%a) nounwind {
				; LA32-LABEL: ld_w:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.w $a1, $a0, 0
				; LA32-NEXT: ld.w $a0, $a0, 12
				; LA32-NEXT: srai.w $a1, $a0, 31
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_w:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.w $a1, $a0, 0
				; LA64-NEXT: ld.w $a0, $a0, 12
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i32, i32* %a, i64 3
				%2 = load i32, i32* %1
				%3 = sext i32 %2 to i64
				%4 = load volatile i32, i32* %a
				ret i64 %3
				}

				define i64 @ld_d(i64 *%a) nounwind {
				; LA32-LABEL: ld_d:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.w $a1, $a0, 4
				; LA32-NEXT: ld.w $a1, $a0, 0
				; LA32-NEXT: ld.w $a1, $a0, 28
				; LA32-NEXT: ld.w $a0, $a0, 24
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_d:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.d $a1, $a0, 0
				; LA64-NEXT: ld.d $a0, $a0, 24
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i64, i64* %a, i64 3
				%2 = load i64, i64* %1
				%3 = load volatile i64, i64* %a
				ret i64 %2
				}

				define i64 @ld_bu(i8 *%a) nounwind {
				; LA32-LABEL: ld_bu:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.bu $a1, $a0, 0
				; LA32-NEXT: ld.bu $a2, $a0, 4
				; LA32-NEXT: add.w $a0, $a2, $a1
				; LA32-NEXT: sltu $a1, $a0, $a2
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_bu:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.bu $a1, $a0, 0
				; LA64-NEXT: ld.bu $a0, $a0, 4
				; LA64-NEXT: add.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i8, i8* %a, i64 4
				%2 = load i8, i8* %1
				%3 = zext i8 %2 to i64
				%4 = load volatile i8, i8* %a
				%5 = zext i8 %4 to i64
				%6 = add i64 %3, %5
				ret i64 %6
				}

				define i64 @ld_hu(i16 *%a) nounwind {
				; LA32-LABEL: ld_hu:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.hu $a1, $a0, 0
				; LA32-NEXT: ld.hu $a2, $a0, 10
				; LA32-NEXT: add.w $a0, $a2, $a1
				; LA32-NEXT: sltu $a1, $a0, $a2
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_hu:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.hu $a1, $a0, 0
				; LA64-NEXT: ld.hu $a0, $a0, 10
				; LA64-NEXT: add.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i16, i16* %a, i64 5
				%2 = load i16, i16* %1
				%3 = zext i16 %2 to i64
				%4 = load volatile i16, i16* %a
				%5 = zext i16 %4 to i64
				%6 = add i64 %3, %5
				ret i64 %6
				}

				define i64 @ld_wu(i32 *%a) nounwind {
				; LA32-LABEL: ld_wu:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.w $a1, $a0, 0
				; LA32-NEXT: ld.w $a2, $a0, 20
				; LA32-NEXT: add.w $a0, $a2, $a1
				; LA32-NEXT: sltu $a1, $a0, $a2
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_wu:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.wu $a1, $a0, 0
				; LA64-NEXT: ld.wu $a0, $a0, 20
				; LA64-NEXT: add.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i32, i32* %a, i64 5
				%2 = load i32, i32* %1
				%3 = zext i32 %2 to i64
				%4 = load volatile i32, i32* %a
				%5 = zext i32 %4 to i64
				%6 = add i64 %3, %5
				ret i64 %6
				}

				;; Check indexed and unindexed stores.

				define void @st_b(i8 *%a, i8 %b) nounwind {
				; ALL-LABEL: st_b:
				; ALL: # %bb.0:
				; ALL-NEXT: st.b $a1, $a0, 6
				; ALL-NEXT: st.b $a1, $a0, 0
				; ALL-NEXT: jirl $zero, $ra, 0
				store i8 %b, i8* %a
				%1 = getelementptr i8, i8* %a, i64 6
				store i8 %b, i8* %1
				ret void
				}

				define void @st_h(i16 *%a, i16 %b) nounwind {
				; ALL-LABEL: st_h:
				; ALL: # %bb.0:
				; ALL-NEXT: st.h $a1, $a0, 14
				; ALL-NEXT: st.h $a1, $a0, 0
				; ALL-NEXT: jirl $zero, $ra, 0
				store i16 %b, i16* %a
				%1 = getelementptr i16, i16* %a, i64 7
				store i16 %b, i16* %1
				ret void
				}

				define void @st_w(i32 *%a, i32 %b) nounwind {
				; ALL-LABEL: st_w:
				; ALL: # %bb.0:
				; ALL-NEXT: st.w $a1, $a0, 28
				; ALL-NEXT: st.w $a1, $a0, 0
				; ALL-NEXT: jirl $zero, $ra, 0
				store i32 %b, i32* %a
				%1 = getelementptr i32, i32* %a, i64 7
				store i32 %b, i32* %1
				ret void
				}

				define void @st_d(i64 *%a, i64 %b) nounwind {
				; LA32-LABEL: st_d:
				; LA32: # %bb.0:
				; LA32-NEXT: st.w $a2, $a0, 68
				; LA32-NEXT: st.w $a2, $a0, 4
				; LA32-NEXT: st.w $a1, $a0, 64
				; LA32-NEXT: st.w $a1, $a0, 0
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: st_d:
				; LA64: # %bb.0:
				; LA64-NEXT: st.d $a1, $a0, 64
				; LA64-NEXT: st.d $a1, $a0, 0
				; LA64-NEXT: jirl $zero, $ra, 0
				store i64 %b, i64* %a
				%1 = getelementptr i64, i64* %a, i64 8
				store i64 %b, i64* %1
				ret void
				}

				;; Check load from and store to an i1 location.
				define i64 @load_sext_zext_anyext_i1(i1 *%a) nounwind {
				;; sextload i1
				; LA32-LABEL: load_sext_zext_anyext_i1:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.b $a1, $a0, 0
				; LA32-NEXT: ld.bu $a1, $a0, 1
				; LA32-NEXT: ld.bu $a2, $a0, 2
				; LA32-NEXT: sub.w $a0, $a2, $a1
				; LA32-NEXT: sltu $a1, $a2, $a1
				; LA32-NEXT: sub.w $a1, $zero, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: load_sext_zext_anyext_i1:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.b $a1, $a0, 0
				; LA64-NEXT: ld.bu $a1, $a0, 1
				; LA64-NEXT: ld.bu $a0, $a0, 2
				; LA64-NEXT: sub.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i1, i1* %a, i64 1
				%2 = load i1, i1* %1
				%3 = sext i1 %2 to i64
				;; zextload i1
				%4 = getelementptr i1, i1* %a, i64 2
				%5 = load i1, i1* %4
				%6 = zext i1 %5 to i64
				%7 = add i64 %3, %6
				;; extload i1 (anyext). Produced as the load is unused.
				%8 = load volatile i1, i1* %a
				ret i64 %7
				}

				define i16 @load_sext_zext_anyext_i1_i16(i1 *%a) nounwind {
				;; sextload i1
				; LA32-LABEL: load_sext_zext_anyext_i1_i16:
				; LA32: # %bb.0:
				; LA32-NEXT: ld.b $a1, $a0, 0
				; LA32-NEXT: ld.bu $a1, $a0, 1
				; LA32-NEXT: ld.bu $a0, $a0, 2
				; LA32-NEXT: sub.w $a0, $a0, $a1
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: load_sext_zext_anyext_i1_i16:
				; LA64: # %bb.0:
				; LA64-NEXT: ld.b $a1, $a0, 0
				; LA64-NEXT: ld.bu $a1, $a0, 1
				; LA64-NEXT: ld.bu $a0, $a0, 2
				; LA64-NEXT: sub.d $a0, $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr i1, i1* %a, i64 1
				%2 = load i1, i1* %1
				%3 = sext i1 %2 to i16
				;; zextload i1
				%4 = getelementptr i1, i1* %a, i64 2
				%5 = load i1, i1* %4
				%6 = zext i1 %5 to i16
				%7 = add i16 %3, %6
				;; extload i1 (anyext). Produced as the load is unused.
				%8 = load volatile i1, i1* %a
				ret i16 %7
				}

				define i64 @ld_sd_constant(i64 %a) nounwind {
				; LA32-LABEL: ld_sd_constant:
				; LA32: # %bb.0:
				; LA32-NEXT: lu12i.w $a3, -136485
				; LA32-NEXT: ori $a4, $a3, 3823
				; LA32-NEXT: ld.w $a2, $a4, 0
				; LA32-NEXT: st.w $a0, $a4, 0
				; LA32-NEXT: ori $a0, $a3, 3827
				; LA32-NEXT: ld.w $a3, $a0, 0
				; LA32-NEXT: st.w $a1, $a0, 0
				; LA32-NEXT: move $a0, $a2
				; LA32-NEXT: move $a1, $a3
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: ld_sd_constant:
				; LA64: # %bb.0:
				; LA64-NEXT: lu12i.w $a1, -136485
				; LA64-NEXT: ori $a1, $a1, 3823
				; LA64-NEXT: lu32i.d $a1, -147729
				xry111Unsubmitted Not Done Reply Inline Actions Based on the previous discussion, should we move "ori" to the last instruction in the long immediate load sequence and change it to "addi.d" if possible, so a peephole optimization would be able to combine "addi.d" and "ld" into one instruction? xry111: Based on the previous discussion, should we move "ori" to the last instruction in the long…
				SixWeiningUnsubmitted Not Done Reply Inline Actions In terms of principle this can be implemented. But seems that this is only suitable for very few scenarios (maybe only when accessing a constant memory address). I'm not very sure. SixWeining: In terms of principle this can be implemented. But seems that this is only suitable for very…
				xen0nUnsubmitted Not Done Reply Inline Actions I think this depends on the micro-architecture. If `lu12i.w; ori; lu32i.d; lu52i.d` sequences or variations of it are macro-op-fusioned, then breaking away from the current pattern could instead harm performance. OTOH it could be a net benefit but as @SixWeining pointed out this case of accessing absolute addresses is probably too rare to justify a special-case. xen0n: I think this depends on the micro-architecture. If `lu12i.w; ori; lu32i.d; lu52i.d` sequences…
				xry111Unsubmitted Not Done Reply Inline Actions Then I'll withdraw the proposal, at least for now. xry111: Then I'll withdraw the proposal, at least for now.
				xry111Unsubmitted Not Done Reply Inline Actions I tested some very simple cases on a 3A5000LL. It seems you can move `ori` after `lu32i.d` or `lu52i.d`, or change it to `addi.d` or even `xori` w/o performance loss. Maybe my case is too noob and can't reflect the performance of real apps though. xry111: I tested some very simple cases on a 3A5000LL. It seems you can move `ori` after `lu32i.d` or…
				xen0nUnsubmitted Not Done Reply Inline Actions Note that I don't know whether there's macro-op fusion of this sort on 3A5000 in the first place; if this fusion actually doesn't happen on 3A5000 then there should be no performance loss whatsoever. We need input from the Loongson hardware team to be sure. xen0n: Note that I don't know whether there's macro-op fusion of this sort on 3A5000 in the first…
				; LA64-NEXT: lu52i.d $a2, $a1, -534
				; LA64-NEXT: ld.d $a1, $a2, 0
				; LA64-NEXT: st.d $a0, $a2, 0
				; LA64-NEXT: move $a0, $a1
				; LA64-NEXT: jirl $zero, $ra, 0
				%1 = inttoptr i64 16045690984833335023 to i64*
				%2 = load volatile i64, i64* %1
				store i64 %a, i64* %1
				ret i64 %2
				}

				;; Check load from and store to a float location.
				define float @load_store_float(float *%a, float %b) nounwind {
				; ALL-LABEL: load_store_float:
				; ALL: # %bb.0:
				; ALL-NEXT: fld.s $fa1, $a0, 4
				; ALL-NEXT: fst.s $fa0, $a0, 4
				; ALL-NEXT: fmov.s $fa0, $fa1
				; ALL-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr float, float* %a, i64 1
				%2 = load float, float* %1
				store float %b, float* %1
				ret float %2
				}

				;; Check load from and store to a double location.
				define double @load_store_double(double *%a, double %b) nounwind {
				; ALL-LABEL: load_store_double:
				; ALL: # %bb.0:
				; ALL-NEXT: fld.d $fa1, $a0, 8
				; ALL-NEXT: fst.d $fa0, $a0, 8
				; ALL-NEXT: fmov.d $fa0, $fa1
				; ALL-NEXT: jirl $zero, $ra, 0
				%1 = getelementptr double, double* %a, i64 1
				%2 = load double, double* %1
				store double %b, double* %1
				ret double %2
				}