This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add custom isel for (add X, imm) used by load/stores.
ClosedPublic

Authored by craig.topper on May 27 2022, 3:48 PM.

Download Raw Diff

Details

Reviewers

reames
asb
luismarques
kito-cheng
khchen
arcbbb
jrtc27

Commits

rGdbead2388b48: [RISCV] Add custom isel for (add X, imm) used by load/stores.

Summary

If the imm is out of range for an ADDI, we will materialize it in
a register using multiple instructions. If the ADD is used by a
load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from
the constant materialization into the load/store offset. This only
works if the ADD has a single use, otherwise the peephole would have
to rebuild multiple nodes.

This patch instead tries to solve the problem when the add is selected.
We check that the add is only used by loads/stores and if it is
we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable
the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI
used as a pointer. As a result we can remove the more complicated
peephole from doPeepholeLoadStoreADDI.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.May 27 2022, 3:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 3:48 PM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 26 others. · View Herald Transcript

craig.topper requested review of this revision.May 27 2022, 3:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 3:48 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B166724: Diff 432658.May 27 2022, 4:30 PM

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

In D126576#3544140, @reames wrote:

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Where do you propose to canonicalize it? Another peephole between isel and doPeepholeLoadStoreADDI?

StephenFan added inline comments.May 29 2022, 9:34 PM

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
703	(ADDI (ADD X, Imm-Hi), Imm-Lo12) ?

craig.topper added inline comments.May 29 2022, 11:10 PM

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
703	That - was meant as subtraction.

In D126576#3544185, @craig.topper wrote:

In D126576#3544140, @reames wrote:

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Where do you propose to canonicalize it? Another peephole between isel and doPeepholeLoadStoreADDI?

It turns out I had a wrong mental model on SDAG handling of constants. I had been thinking we'd expanded the constant materialization sequences during legalization; looking at an example with debug output, this turns out not to be the case. As such, my assumption that this was just a DAGCombine rule is off base.

Given that, yeah, the structure would seem to require a separate peephole between ISEL and the current address matching. ("between" might be slightly the wrong word here as we could probably fixed point the later two.)

In D126576#3547631, @reames wrote:

In D126576#3544185, @craig.topper wrote:

In D126576#3544140, @reames wrote:

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Where do you propose to canonicalize it? Another peephole between isel and doPeepholeLoadStoreADDI?

It turns out I had a wrong mental model on SDAG handling of constants. I had been thinking we'd expanded the constant materialization sequences during legalization; looking at an example with debug output, this turns out not to be the case. As such, my assumption that this was just a DAGCombine rule is off base.

Given that, yeah, the structure would seem to require a separate peephole between ISEL and the current address matching. ("between" might be slightly the wrong word here as we could probably fixed point the later two.)

I'm not sure it would significantly simplify much over this patch. I would still want to check the memory folding opportunity before moving the ADDI across the ADD. Some targets may support LUI+ADDI macrofusion so we shouldn't split those up unless the ADDI is guaranteed to be removed.

Doing it as part of isel means we strip the lower bits off before constant materialization instead of needing to pattern match the LUI+ADDIW in a peephole. Though I guess maybe that could be avoided if we used LUI+ADDI instead of LUI+ADDIW when possible.

LGTM

Not entirely thrilled with this, but don't want perfection to be the enemy of the good here. We can take this and continue to think about other approaches to the problem.

This revision is now accepted and ready to land.Jun 2 2022, 12:06 PM

In D126576#3554172, @reames wrote:

LGTM

Not entirely thrilled with this, but don't want perfection to be the enemy of the good here. We can take this and continue to think about other approaches to the problem.

Part of me wonders if we should move load/store addressing match to ComplexPat that finds the register and offset. Similar to how we do address matching on X86. I think that would allow us to remove the late peephole. Does that sound like a direction I should investigate?

In D126576#3554189, @craig.topper wrote:

In D126576#3554172, @reames wrote:

LGTM

Not entirely thrilled with this, but don't want perfection to be the enemy of the good here. We can take this and continue to think about other approaches to the problem.

Part of me wonders if we should move load/store addressing match to ComplexPat that finds the register and offset. Similar to how we do address matching on X86. I think that would allow us to remove the late peephole. Does that sound like a direction I should investigate?

Honestly, not sure. I don't yet have enough context to have a gut feel to this.

This revision was landed with ongoing or failed builds.Jun 2 2022, 1:45 PM

Closed by commit rGdbead2388b48: [RISCV] Add custom isel for (add X, imm) used by load/stores. (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGdbead2388b48: [RISCV] Add custom isel for (add X, imm) used by load/stores..

craig.topper mentioned this in D126932: [RISCV] Reduce scalar load/store isel patterns to a single ComplexPattern. NFCI.Jun 2 2022, 5:35 PM

craig.topper mentioned this in rG440285200265: [RISCV] Reduce scalar load/store isel patterns to a single ComplexPattern. NFCI.Jun 3 2022, 9:01 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelDAGToDAG.cpp

201 lines

test/

CodeGen/

RISCV/

mem.ll

10 lines

mem64.ll

25 lines

split-offsets.ll

31 lines

Diff 432658

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

//===-- RISCVISelDAGToDAG.cpp - A dag to dag inst selector for RISCV ------===//		//===-- RISCVISelDAGToDAG.cpp - A dag to dag inst selector for RISCV ------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	void RISCVDAGToDAGISel::PostprocessISelDAG() {
}		}

CurDAG->setRoot(Dummy.getValue());		CurDAG->setRoot(Dummy.getValue());

if (MadeChange)		if (MadeChange)
CurDAG->RemoveDeadNodes();		CurDAG->RemoveDeadNodes();
}		}

		// Returns true if N is a MachineSDNode that has a reg and simm12 memory
		// operand. The indices of the base pointer and offset are returned in BaseOpIdx
		// and OffsetOpIdx.
		static bool hasMemOffset(SDNode *N, unsigned &BaseOpIdx,
		unsigned &OffsetOpIdx) {
		switch (N->getMachineOpcode()) {
		case RISCV::LB:
		case RISCV::LH:
		case RISCV::LW:
		case RISCV::LBU:
		case RISCV::LHU:
		case RISCV::LWU:
		case RISCV::LD:
		case RISCV::FLH:
		case RISCV::FLW:
		case RISCV::FLD:
		BaseOpIdx = 0;
		OffsetOpIdx = 1;
		return true;
		case RISCV::SB:
		case RISCV::SH:
		case RISCV::SW:
		case RISCV::SD:
		case RISCV::FSH:
		case RISCV::FSW:
		case RISCV::FSD:
		BaseOpIdx = 1;
		OffsetOpIdx = 2;
		return true;
		}

		return false;
		}

static SDNode selectImmWithConstantPool(SelectionDAG CurDAG, const SDLoc &DL,		static SDNode selectImmWithConstantPool(SelectionDAG CurDAG, const SDLoc &DL,
const MVT VT, int64_t Imm,		const MVT VT, int64_t Imm,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
assert(VT == MVT::i64 && "Expecting MVT::i64");		assert(VT == MVT::i64 && "Expecting MVT::i64");
const RISCVTargetLowering *TLI = Subtarget.getTargetLowering();		const RISCVTargetLowering *TLI = Subtarget.getTargetLowering();
ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(CurDAG->getConstantPool(		ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(CurDAG->getConstantPool(
ConstantInt::get(EVT(VT).getTypeForEVT(*CurDAG->getContext()), Imm), VT));		ConstantInt::get(EVT(VT).getTypeForEVT(*CurDAG->getContext()), Imm), VT));
SDValue Addr = TLI->getAddr(CP, *CurDAG);		SDValue Addr = TLI->getAddr(CP, *CurDAG);
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	void RISCVDAGToDAGISel::Select(SDNode *Node) {
}		}
case ISD::FrameIndex: {		case ISD::FrameIndex: {
SDValue Imm = CurDAG->getTargetConstant(0, DL, XLenVT);		SDValue Imm = CurDAG->getTargetConstant(0, DL, XLenVT);
int FI = cast<FrameIndexSDNode>(Node)->getIndex();		int FI = cast<FrameIndexSDNode>(Node)->getIndex();
SDValue TFI = CurDAG->getTargetFrameIndex(FI, VT);		SDValue TFI = CurDAG->getTargetFrameIndex(FI, VT);
ReplaceNode(Node, CurDAG->getMachineNode(RISCV::ADDI, DL, VT, TFI, Imm));		ReplaceNode(Node, CurDAG->getMachineNode(RISCV::ADDI, DL, VT, TFI, Imm));
return;		return;
}		}
		case ISD::ADD: {
		// Try to select ADD + immediate used as memory addresses to
		// (ADDI (ADD X, Imm-Lo12), Lo12) if it will allow the ADDI to be removed by
		StephenFanUnsubmitted Not Done Reply Inline Actions (ADDI (ADD X, Imm-Hi), Imm-Lo12) ? StephenFan: (ADDI (ADD X, Imm-Hi), Imm-Lo12) ?
		craig.topperAuthorUnsubmitted Done Reply Inline Actions That - was meant as subtraction. craig.topper: That - was meant as subtraction.
		// doPeepholeLoadStoreADDI.

		// LHS should be an immediate.
		auto *N1C = dyn_cast<ConstantSDNode>(Node->getOperand(1));
		if (!N1C)
		break;

		int64_t Offset = N1C->getSExtValue();
		int64_t Lo12 = SignExtend64<12>(Offset);

		// Don't do this if the lower 12 bits are 0 or we could use ADDI directly.
		if (Lo12 == 0 \|\| isInt<12>(Offset))
		break;

		// Don't do this if we can use a pair of ADDIs.
		if (isInt<12>(Offset / 2) && isInt<12>(Offset - Offset / 2))
		break;

		bool AllPointerUses = true;
		for (auto UI = Node->use_begin(), UE = Node->use_end(); UI != UE; ++UI) {
		SDNode User = UI;

		// Is this user a memory instruction that uses a register and immediate
		// that has this ADD as its pointer.
		unsigned BaseOpIdx, OffsetOpIdx;
		if (!User->isMachineOpcode() \|\|
		!hasMemOffset(User, BaseOpIdx, OffsetOpIdx) \|\|
		UI.getOperandNo() != BaseOpIdx) {
		AllPointerUses = false;
		break;
		}

		// If the memory instruction already has an offset, make sure the combined
		// offset is foldable.
		int64_t MemOffs =
		cast<ConstantSDNode>(User->getOperand(OffsetOpIdx))->getSExtValue();
		MemOffs += Lo12;
		if (!isInt<12>(MemOffs)) {
		AllPointerUses = false;
		break;
		}
		}

		if (!AllPointerUses)
		break;

		Offset -= Lo12;
		// Restore sign bits for RV32.
		if (!Subtarget->is64Bit())
		Offset = SignExtend64<32>(Offset);

		// Emit (ADDI (ADD X, Hi), Lo)
		SDNode Imm = selectImm(CurDAG, DL, VT, Offset, Subtarget);
		SDNode *ADD = CurDAG->getMachineNode(RISCV::ADD, DL, VT,
		Node->getOperand(0), SDValue(Imm, 0));
		SDNode *ADDI =
		CurDAG->getMachineNode(RISCV::ADDI, DL, VT, SDValue(ADD, 0),
		CurDAG->getTargetConstant(Lo12, DL, VT));
		ReplaceNode(Node, ADDI);
		return;
		}
case ISD::SRL: {		case ISD::SRL: {
// Optimize (srl (and X, C2), C) ->		// Optimize (srl (and X, C2), C) ->
// (srli (slli X, (XLen-C3), (XLen-C3) + C)		// (srli (slli X, (XLen-C3), (XLen-C3) + C)
// Where C2 is a mask with C3 trailing ones.		// Where C2 is a mask with C3 trailing ones.
// Taking into account that the C2 may have had lower bits unset by		// Taking into account that the C2 may have had lower bits unset by
// SimplifyDemandedBits. This avoids materializing the C2 immediate.		// SimplifyDemandedBits. This avoids materializing the C2 immediate.
// This pattern occurs when type legalizing right shifts for types with		// This pattern occurs when type legalizing right shifts for types with
// less than XLen bits.		// less than XLen bits.
▲ Show 20 Lines • Show All 1,417 Lines • ▼ Show 20 Lines
// (load (addi base, off1), off2) -> (load base, off1+off2)		// (load (addi base, off1), off2) -> (load base, off1+off2)
// (store val, (addi base, off1), off2) -> (store val, base, off1+off2)		// (store val, (addi base, off1), off2) -> (store val, base, off1+off2)
// (load (add base, (addi src, off1)), off2)		// (load (add base, (addi src, off1)), off2)
// -> (load (add base, src), off1+off2)		// -> (load (add base, src), off1+off2)
// (store val, (add base, (addi src, off1)), off2)		// (store val, (add base, (addi src, off1)), off2)
// -> (store val, (add base, src), off1+off2)		// -> (store val, (add base, src), off1+off2)
// This is possible when off1+off2 fits a 12-bit immediate.		// This is possible when off1+off2 fits a 12-bit immediate.
bool RISCVDAGToDAGISel::doPeepholeLoadStoreADDI(SDNode *N) {		bool RISCVDAGToDAGISel::doPeepholeLoadStoreADDI(SDNode *N) {
int OffsetOpIdx;		unsigned OffsetOpIdx, BaseOpIdx;
int BaseOpIdx;		if (!hasMemOffset(N, BaseOpIdx, OffsetOpIdx))

// Only attempt this optimisation for I-type loads and S-type stores.
switch (N->getMachineOpcode()) {
default:
return false;		return false;
case RISCV::LB:
case RISCV::LH:
case RISCV::LW:
case RISCV::LBU:
case RISCV::LHU:
case RISCV::LWU:
case RISCV::LD:
case RISCV::FLH:
case RISCV::FLW:
case RISCV::FLD:
BaseOpIdx = 0;
OffsetOpIdx = 1;
break;
case RISCV::SB:
case RISCV::SH:
case RISCV::SW:
case RISCV::SD:
case RISCV::FSH:
case RISCV::FSW:
case RISCV::FSD:
BaseOpIdx = 1;
OffsetOpIdx = 2;
break;
}

if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)))		if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)))
return false;		return false;

SDValue Base = N->getOperand(BaseOpIdx);		SDValue Base = N->getOperand(BaseOpIdx);

if (!Base.isMachineOpcode())		if (!Base.isMachineOpcode())
return false;		return false;

// There is a ADD between ADDI and load/store. We can only fold ADDI that
// do not have a FrameIndex operand.
SDValue Add;
unsigned AddBaseIdx;
if (Base.getMachineOpcode() == RISCV::ADD && Base.hasOneUse()) {
Add = Base;
SDValue Op0 = Base.getOperand(0);
SDValue Op1 = Base.getOperand(1);
if (Op0.isMachineOpcode() && Op0.getMachineOpcode() == RISCV::ADDI &&
!isa<FrameIndexSDNode>(Op0.getOperand(0)) &&
isa<ConstantSDNode>(Op0.getOperand(1))) {
AddBaseIdx = 1;
Base = Op0;
} else if (Op1.isMachineOpcode() && Op1.getMachineOpcode() == RISCV::ADDI &&
!isa<FrameIndexSDNode>(Op1.getOperand(0)) &&
isa<ConstantSDNode>(Op1.getOperand(1))) {
AddBaseIdx = 0;
Base = Op1;
} else if (Op1.isMachineOpcode() &&
Op1.getMachineOpcode() == RISCV::ADDIW &&
isa<ConstantSDNode>(Op1.getOperand(1)) &&
Op1.getOperand(0).isMachineOpcode() &&
Op1.getOperand(0).getMachineOpcode() == RISCV::LUI) {
// We found an LUI+ADDIW constant materialization. We might be able to
// fold the ADDIW offset if it could be treated as ADDI.
// Emulate the constant materialization to see if the result would be
// a simm32 if ADDI was used instead of ADDIW.

// First the LUI.
uint64_t Imm = Op1.getOperand(0).getConstantOperandVal(0);
Imm <<= 12;
Imm = SignExtend64<32>(Imm);

// Then the ADDI.
uint64_t LoImm = cast<ConstantSDNode>(Op1.getOperand(1))->getSExtValue();
Imm += LoImm;

// If the result isn't a simm32, we can't do the optimization.
if (!isInt<32>(Imm))
return false;

AddBaseIdx = 0;
Base = Op1;
} else
return false;
} else if (Base.getMachineOpcode() == RISCV::ADDI) {
// If the base is an ADDI, we can merge it in to the load/store.		// If the base is an ADDI, we can merge it in to the load/store.
} else		if (Base.getMachineOpcode() != RISCV::ADDI)
return false;		return false;

SDValue ImmOperand = Base.getOperand(1);		SDValue ImmOperand = Base.getOperand(1);
uint64_t Offset2 = N->getConstantOperandVal(OffsetOpIdx);		uint64_t Offset2 = N->getConstantOperandVal(OffsetOpIdx);

if (auto *Const = dyn_cast<ConstantSDNode>(ImmOperand)) {		if (auto *Const = dyn_cast<ConstantSDNode>(ImmOperand)) {
int64_t Offset1 = Const->getSExtValue();		int64_t Offset1 = Const->getSExtValue();
int64_t CombinedOffset = Offset1 + Offset2;		int64_t CombinedOffset = Offset1 + Offset2;
Show All 30 Lines	bool RISCVDAGToDAGISel::doPeepholeLoadStoreADDI(SDNode *N) {
}		}

LLVM_DEBUG(dbgs() << "Folding add-immediate into mem-op:\nBase: ");		LLVM_DEBUG(dbgs() << "Folding add-immediate into mem-op:\nBase: ");
LLVM_DEBUG(Base->dump(CurDAG));		LLVM_DEBUG(Base->dump(CurDAG));
LLVM_DEBUG(dbgs() << "\nN: ");		LLVM_DEBUG(dbgs() << "\nN: ");
LLVM_DEBUG(N->dump(CurDAG));		LLVM_DEBUG(N->dump(CurDAG));
LLVM_DEBUG(dbgs() << "\n");		LLVM_DEBUG(dbgs() << "\n");

if (Add)
Add = SDValue(CurDAG->UpdateNodeOperands(Add.getNode(),
Add.getOperand(AddBaseIdx),
Base.getOperand(0)),
0);

// Modify the offset operand of the load/store.		// Modify the offset operand of the load/store.
if (BaseOpIdx == 0) { // Load		if (BaseOpIdx == 0) { // Load
if (Add)
N = CurDAG->UpdateNodeOperands(N, Add, ImmOperand, N->getOperand(2));
else
N = CurDAG->UpdateNodeOperands(N, Base.getOperand(0), ImmOperand,		N = CurDAG->UpdateNodeOperands(N, Base.getOperand(0), ImmOperand,
N->getOperand(2));		N->getOperand(2));
} else { // Store		} else { // Store
if (Add)
N = CurDAG->UpdateNodeOperands(N, N->getOperand(0), Add, ImmOperand,
N->getOperand(3));
else
N = CurDAG->UpdateNodeOperands(N, N->getOperand(0), Base.getOperand(0),		N = CurDAG->UpdateNodeOperands(N, N->getOperand(0), Base.getOperand(0),
ImmOperand, N->getOperand(3));		ImmOperand, N->getOperand(3));
}		}

return true;		return true;
}		}

// Try to remove sext.w if the input is a W instruction or can be made into		// Try to remove sext.w if the input is a W instruction or can be made into
// a W instruction cheaply.		// a W instruction cheaply.
bool RISCVDAGToDAGISel::doPeepholeSExtW(SDNode *N) {		bool RISCVDAGToDAGISel::doPeepholeSExtW(SDNode *N) {
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/mem.ll

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	; RV32I-NEXT: ret
store i32 %b, i32* %1		store i32 %b, i32* %1
ret void		ret void
}		}

define i32 @lw_sw_far_local(i32* %a, i32 %b) {		define i32 @lw_sw_far_local(i32* %a, i32 %b) {
; RV32I-LABEL: lw_sw_far_local:		; RV32I-LABEL: lw_sw_far_local:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: lui a2, 4		; RV32I-NEXT: lui a2, 4
; RV32I-NEXT: addi a2, a2, -4
; RV32I-NEXT: add a2, a0, a2		; RV32I-NEXT: add a2, a0, a2
; RV32I-NEXT: lw a0, 0(a2)		; RV32I-NEXT: lw a0, -4(a2)
; RV32I-NEXT: sw a1, 0(a2)		; RV32I-NEXT: sw a1, -4(a2)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
%1 = getelementptr inbounds i32, i32* %a, i64 4095		%1 = getelementptr inbounds i32, i32* %a, i64 4095
%2 = load volatile i32, i32* %1		%2 = load volatile i32, i32* %1
store i32 %b, i32* %1		store i32 %b, i32* %1
ret i32 %2		ret i32 %2
}		}

define i32 @lw_really_far_local(i32* %a) {		define i32 @lw_really_far_local(i32* %a) {
Show All 19 Lines	; RV32I-NEXT: ret
store i32 %b, i32* %1		store i32 %b, i32* %1
ret void		ret void
}		}

define i32 @lw_sw_really_far_local(i32* %a, i32 %b) {		define i32 @lw_sw_really_far_local(i32* %a, i32 %b) {
; RV32I-LABEL: lw_sw_really_far_local:		; RV32I-LABEL: lw_sw_really_far_local:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: lui a2, 524288		; RV32I-NEXT: lui a2, 524288
; RV32I-NEXT: addi a2, a2, -2048
; RV32I-NEXT: add a2, a0, a2		; RV32I-NEXT: add a2, a0, a2
; RV32I-NEXT: lw a0, 0(a2)		; RV32I-NEXT: lw a0, -2048(a2)
; RV32I-NEXT: sw a1, 0(a2)		; RV32I-NEXT: sw a1, -2048(a2)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
%1 = getelementptr inbounds i32, i32* %a, i32 536870400		%1 = getelementptr inbounds i32, i32* %a, i32 536870400
%2 = load volatile i32, i32* %1		%2 = load volatile i32, i32* %1
store i32 %b, i32* %1		store i32 %b, i32* %1
ret i32 %2		ret i32 %2
}		}

%struct.quux = type { i32, [0 x i8] }		%struct.quux = type { i32, [0 x i8] }
Show All 27 Lines

llvm/test/CodeGen/RISCV/mem64.ll

Show First 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	; RV64I-NEXT: ret
store i64 %b, i64* %1		store i64 %b, i64* %1
ret void		ret void
}		}

define i64 @lw_sw_far_local(i64* %a, i64 %b) {		define i64 @lw_sw_far_local(i64* %a, i64 %b) {
; RV64I-LABEL: lw_sw_far_local:		; RV64I-LABEL: lw_sw_far_local:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: lui a2, 8		; RV64I-NEXT: lui a2, 8
; RV64I-NEXT: addiw a2, a2, -8
; RV64I-NEXT: add a2, a0, a2		; RV64I-NEXT: add a2, a0, a2
; RV64I-NEXT: ld a0, 0(a2)		; RV64I-NEXT: ld a0, -8(a2)
; RV64I-NEXT: sd a1, 0(a2)		; RV64I-NEXT: sd a1, -8(a2)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
%1 = getelementptr inbounds i64, i64* %a, i64 4095		%1 = getelementptr inbounds i64, i64* %a, i64 4095
%2 = load volatile i64, i64* %1		%2 = load volatile i64, i64* %1
store i64 %b, i64* %1		store i64 %b, i64* %1
ret i64 %2		ret i64 %2
}		}

; Make sure we don't fold the addiw into the load offset. The sign extend of the		; Make sure we don't fold the addiw into the load offset. The sign extend of the
; addiw is required.		; addiw is required.
define i64 @lw_really_far_local(i64* %a) {		define i64 @lw_really_far_local(i64* %a) {
; RV64I-LABEL: lw_really_far_local:		; RV64I-LABEL: lw_really_far_local:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: lui a1, 524288		; RV64I-NEXT: li a1, 1
; RV64I-NEXT: addiw a1, a1, -2048		; RV64I-NEXT: slli a1, a1, 31
; RV64I-NEXT: add a0, a0, a1		; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: ld a0, 0(a0)		; RV64I-NEXT: ld a0, -2048(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
%1 = getelementptr inbounds i64, i64* %a, i64 268435200		%1 = getelementptr inbounds i64, i64* %a, i64 268435200
%2 = load volatile i64, i64* %1		%2 = load volatile i64, i64* %1
ret i64 %2		ret i64 %2
}		}

; Make sure we don't fold the addiw into the store offset. The sign extend of		; Make sure we don't fold the addiw into the store offset. The sign extend of
; the addiw is required.		; the addiw is required.
define void @st_really_far_local(i64* %a, i64 %b) {		define void @st_really_far_local(i64* %a, i64 %b) {
; RV64I-LABEL: st_really_far_local:		; RV64I-LABEL: st_really_far_local:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: lui a2, 524288		; RV64I-NEXT: li a2, 1
; RV64I-NEXT: addiw a2, a2, -2048		; RV64I-NEXT: slli a2, a2, 31
; RV64I-NEXT: add a0, a0, a2		; RV64I-NEXT: add a0, a0, a2
; RV64I-NEXT: sd a1, 0(a0)		; RV64I-NEXT: sd a1, -2048(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
%1 = getelementptr inbounds i64, i64* %a, i64 268435200		%1 = getelementptr inbounds i64, i64* %a, i64 268435200
store i64 %b, i64* %1		store i64 %b, i64* %1
ret void		ret void
}		}

; Make sure we don't fold the addiw into the load/store offset. The sign extend		; Make sure we don't fold the addiw into the load/store offset. The sign extend
; of the addiw is required.		; of the addiw is required.
define i64 @lw_sw_really_far_local(i64* %a, i64 %b) {		define i64 @lw_sw_really_far_local(i64* %a, i64 %b) {
; RV64I-LABEL: lw_sw_really_far_local:		; RV64I-LABEL: lw_sw_really_far_local:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: lui a2, 524288		; RV64I-NEXT: li a2, 1
; RV64I-NEXT: addiw a2, a2, -2048		; RV64I-NEXT: slli a2, a2, 31
; RV64I-NEXT: add a2, a0, a2		; RV64I-NEXT: add a2, a0, a2
; RV64I-NEXT: ld a0, 0(a2)		; RV64I-NEXT: ld a0, -2048(a2)
; RV64I-NEXT: sd a1, 0(a2)		; RV64I-NEXT: sd a1, -2048(a2)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
%1 = getelementptr inbounds i64, i64* %a, i64 268435200		%1 = getelementptr inbounds i64, i64* %a, i64 268435200
%2 = load volatile i64, i64* %1		%2 = load volatile i64, i64* %1
store i64 %b, i64* %1		store i64 %b, i64* %1
ret i64 %2		ret i64 %2
}		}

%struct.quux = type { i32, [0 x i8] }		%struct.quux = type { i32, [0 x i8] }
Show All 27 Lines

llvm/test/CodeGen/RISCV/split-offsets.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \		; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: \| FileCheck %s -check-prefix=RV32I		; RUN: \| FileCheck %s -check-prefix=RV32I
; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \		; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
; RUN: \| FileCheck %s -check-prefix=RV64I		; RUN: \| FileCheck %s -check-prefix=RV64I

; Check that memory accesses to array elements with large offsets have those		; Check that memory accesses to array elements with large offsets have those
; offsets split into a base offset, plus a smaller offset that is folded into		; offsets split into a base offset, plus a smaller offset that is folded into
; the memory operation. We should also only compute that base offset once,		; the memory operation. We should also only compute that base offset once,
; since it can be shared for all memory operations in this test.		; since it can be shared for all memory operations in this test.
define void @test1([65536 x i32]** %sp, [65536 x i32]* %t, i32 %n) {		define void @test1([65536 x i32]** %sp, [65536 x i32]* %t, i32 %n) {
; RV32I-LABEL: test1:		; RV32I-LABEL: test1:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lw a0, 0(a0)		; RV32I-NEXT: lw a0, 0(a0)
; RV32I-NEXT: lui a2, 20		; RV32I-NEXT: lui a2, 20
; RV32I-NEXT: addi a2, a2, -1920
; RV32I-NEXT: add a1, a1, a2		; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: add a0, a0, a2		; RV32I-NEXT: add a0, a0, a2
; RV32I-NEXT: li a2, 2		; RV32I-NEXT: li a2, 2
; RV32I-NEXT: sw a2, 0(a0)		; RV32I-NEXT: sw a2, -1920(a0)
; RV32I-NEXT: li a3, 1		; RV32I-NEXT: li a3, 1
; RV32I-NEXT: sw a3, 4(a0)		; RV32I-NEXT: sw a3, -1916(a0)
; RV32I-NEXT: sw a3, 0(a1)		; RV32I-NEXT: sw a3, -1920(a1)
; RV32I-NEXT: sw a2, 4(a1)		; RV32I-NEXT: sw a2, -1916(a1)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test1:		; RV64I-LABEL: test1:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: ld a0, 0(a0)		; RV64I-NEXT: ld a0, 0(a0)
; RV64I-NEXT: lui a2, 20		; RV64I-NEXT: lui a2, 20
; RV64I-NEXT: addiw a2, a2, -1920
; RV64I-NEXT: add a1, a1, a2		; RV64I-NEXT: add a1, a1, a2
; RV64I-NEXT: add a0, a0, a2		; RV64I-NEXT: add a0, a0, a2
; RV64I-NEXT: li a2, 2		; RV64I-NEXT: li a2, 2
; RV64I-NEXT: sw a2, 0(a0)		; RV64I-NEXT: sw a2, -1920(a0)
; RV64I-NEXT: li a3, 1		; RV64I-NEXT: li a3, 1
; RV64I-NEXT: sw a3, 4(a0)		; RV64I-NEXT: sw a3, -1916(a0)
; RV64I-NEXT: sw a3, 0(a1)		; RV64I-NEXT: sw a3, -1920(a1)
; RV64I-NEXT: sw a2, 4(a1)		; RV64I-NEXT: sw a2, -1916(a1)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
%s = load [65536 x i32], [65536 x i32]* %sp		%s = load [65536 x i32], [65536 x i32]* %sp
%gep0 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20000		%gep0 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20000
%gep1 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20001		%gep1 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20001
%gep2 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20000		%gep2 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20000
%gep3 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20001		%gep3 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20001
store i32 2, i32* %gep0		store i32 2, i32* %gep0
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	while_body:
store i32 %i, i32* %gep2		store i32 %i, i32* %gep2
store i32 %phi, i32* %gep3		store i32 %phi, i32* %gep3
br label %while_cond		br label %while_cond
while_end:		while_end:
ret void		ret void
}		}

; GEPs have been manually split so the base GEP does not get used by any memory		; GEPs have been manually split so the base GEP does not get used by any memory
; instructions. Make sure we use a small offset in each of the stores.		; instructions. Make sure we use an offset and common base for each of the
		; stores.
define void @test3([65536 x i32]* %t) {		define void @test3([65536 x i32]* %t) {
; RV32I-LABEL: test3:		; RV32I-LABEL: test3:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a1, 20		; RV32I-NEXT: lui a1, 20
; RV32I-NEXT: addi a1, a1, -1920
; RV32I-NEXT: add a0, a0, a1		; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: li a1, 2		; RV32I-NEXT: li a1, 2
; RV32I-NEXT: sw a1, 4(a0)		; RV32I-NEXT: sw a1, -1916(a0)
; RV32I-NEXT: li a1, 3		; RV32I-NEXT: li a1, 3
; RV32I-NEXT: sw a1, 8(a0)		; RV32I-NEXT: sw a1, -1912(a0)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test3:		; RV64I-LABEL: test3:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: lui a1, 20		; RV64I-NEXT: lui a1, 20
; RV64I-NEXT: addiw a1, a1, -1920
; RV64I-NEXT: add a0, a0, a1		; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: li a1, 2		; RV64I-NEXT: li a1, 2
; RV64I-NEXT: sw a1, 4(a0)		; RV64I-NEXT: sw a1, -1916(a0)
; RV64I-NEXT: li a1, 3		; RV64I-NEXT: li a1, 3
; RV64I-NEXT: sw a1, 8(a0)		; RV64I-NEXT: sw a1, -1912(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
%0 = bitcast [65536 x i32]* %t to i8*		%0 = bitcast [65536 x i32]* %t to i8*
%splitgep = getelementptr i8, i8* %0, i64 80000		%splitgep = getelementptr i8, i8* %0, i64 80000
%1 = getelementptr i8, i8* %splitgep, i64 4		%1 = getelementptr i8, i8* %splitgep, i64 4
%2 = bitcast i8* %1 to i32*		%2 = bitcast i8* %1 to i32*
%3 = getelementptr i8, i8* %splitgep, i64 8		%3 = getelementptr i8, i8* %splitgep, i64 8
%4 = bitcast i8* %3 to i32*		%4 = bitcast i8* %3 to i32*
store i32 2, i32* %2, align 4		store i32 2, i32* %2, align 4
store i32 3, i32* %4, align 4		store i32 3, i32* %4, align 4
ret void		ret void
}		}