This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVISelDAGToDAG.h
1
RISCVISelDAGToDAG.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
2
fold-addi-loadstore.ll
-
global-merge-minsize.ll
-
global-merge-offset.ll
-
global-merge.ll

Differential D129686

[RISCV] Reuse a materialised global address in preference to merging into a load/store
Needs ReviewPublic

Authored by asb on Jul 13 2022, 12:30 PM.

Download Raw Diff

Details

Reviewers

reames
craig.topper
luismarques

Summary

This isn't ready for full review - I need to clean up and add in more tests that reflect issues found when working on this. But posting for any initial feedback from other timezones ahead of me looking again tomorrow (UK time).

I noted in D129178 that in some cases, code sequences like:

lui a1, %hi(.L_MergedGlobals)
sw a0, %lo(.L_MergedGlobals)(a1)
addi a1, a1, %lo(.L_MergedGlobals)
... (other users of a1)

Where altering the sw to use the global address once it's fully materialised into a1 might be beneficial for code size (increasing the chance the sw is compressible). Such code patterns can exist without globals merging, but the globals merging code makes them much more common.

This patch achieves this by:

Altering SelectAddrRegImm so it won't fold in an ADD_LO if the C extension is enabled and it has users that aren't memory operations (which is typically the case when other offsets of the global are calculated with ADDs, which normally results in the global address being materialised into a register)
- TODO: it would be best to disable this if the load/store is a half-word or a float on RV64C (as there are no compressed forms of those instructions available anyway)
Adding a peephole for handling the case where this turned out to be a bad idea - which can happen if all of those global offset calculations were merged into memory operations. Given the work Craig has done to remove the store/addi peephole, this isn't ideal...

Diff Detail

Event Timeline

asb created this revision.Jul 13 2022, 12:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 12:30 PM

Herald added subscribers: wingo, sunshaoce, pmatos and 30 others. · View Herald Transcript

asb requested review of this revision.Jul 13 2022, 12:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 12:30 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B175214: Diff 444379.Jul 13 2022, 12:31 PM

asb added a parent revision: D129178: [RISCV] Enable the GlobalMerge pass by default.Jul 13 2022, 12:47 PM

asb mentioned this in D130082: [RISCV][test] Update fold-addi-loadstore.ll for D129686.Jul 19 2022, 5:56 AM

Rebase, and only alter the codegen path if the C extension is enabled, update patch summary.

Any thoughts on this kind of transformation? I'm thinking it may make more sense as an extension to RISCVMakeCompressible (as then we can additionally gate the transformation on whether the load/store operand registers are compressible in the first place). And perhaps we want some parts of RISCVMakeCompressible to run even for optimisation modes other than Os/Oz (see e.g. #56390 for another case this might be worthwhile).

Harbormaster completed remote builds in B176239: Diff 445795.Jul 19 2022, 6:03 AM

asb added a parent revision: D130082: [RISCV][test] Update fold-addi-loadstore.ll for D129686.Jul 19 2022, 6:04 AM

luismarques mentioned this in D129178: [RISCV] Enable the GlobalMerge pass by default.Jul 27 2022, 4:01 PM

reames added inline comments.Jan 30 2023, 12:11 PM

llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll
84	Looking at your test here, there's something interesting lurking. You probably already knew this, but it took me a while staring at the code to figure this out. We're materializing two addresses here. Those addresses are known to be four bytes apart, but we do not know that e.g. the high bits are common. The two addresses could e.g. straddle a 12 bit aligned boundary, and thus the high bits of the second may not match those of the first. If we did know this fact, we could use the common high bits and fold both low parts into loads. (Your load_g_8 actually covers exactly that.) So, conceptually, the Rv32 split case does look exactly like two merged i32 globals. Now, I do have to say that this test case may not matter. The RV32 split i64 case should be 8 byte aligned, and thus not hit this. I do see why this is interesting from a global merging perspective though. Coming at this a bit differently, are we allowed to increase the alignment of the merged global? If we are, we might make cases look like more load_g_8 than load_g_1. Another idea: Do we have any way to express the opposite of alignment? That is, we don't care that an address is aligned unless the result would cross some boundary? I remember we have such a concept for instructions (e.g. boundary align, thanks JCC errata!), but if could spell something like that for globals, then we could leave their addresses unaligned unless they crossed the boundary which allows the high bits to differ. Haven't explored that much, so not sure what we have here.

Herald added a subscriber: luke. · View Herald TranscriptJan 30 2023, 12:11 PM

asb added inline comments.Feb 2 2023, 6:01 AM

llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll
84	Coming at this a bit differently, are we allowed to increase the alignment of the merged global? If we are, we might make cases look like more load_g_8 than load_g_1. That seems like a promising idea. I'm not aware of anything that would prevent us from increasing alignment. GlobalMerge currently sets the alignment of the merged global to the maximum alignment of its merged members, and maintains alignment for the members using a struct and adding padding as necessary - I think additional padding could just be added. Re the second idea: I don't think I've seen anything that would give that so far, but perhaps I've missed it. And as you suggest, there's some precedent elsewhere.

luke added inline comments.Feb 2 2023, 6:10 AM

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2389	/in that is the case/in if that is the case

luke957 removed a subscriber: luke957.Feb 2 2023, 11:28 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelDAGToDAG.h

1 line

RISCVISelDAGToDAG.cpp

95 lines

test/

CodeGen/

RISCV/

fold-addi-loadstore.ll

171 lines

global-merge-minsize.ll

4 lines

global-merge-offset.ll

3 lines

global-merge.ll

4 lines

Diff 445795

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	case ISD::SETUGE:
return RISCVCC::COND_GEU;		return RISCVCC::COND_GEU;
}		}
}		}

// Include the pieces autogenerated from the target description.		// Include the pieces autogenerated from the target description.
#include "RISCVGenDAGISel.inc"		#include "RISCVGenDAGISel.inc"

private:		private:
		bool doPeepholeLoadStoreADDI(SDNode *Node);
bool doPeepholeSExtW(SDNode *Node);		bool doPeepholeSExtW(SDNode *Node);
bool doPeepholeMaskedRVV(SDNode *Node);		bool doPeepholeMaskedRVV(SDNode *Node);
};		};

namespace RISCV {		namespace RISCV {
struct VLSEGPseudo {		struct VLSEGPseudo {
uint16_t NF : 4;		uint16_t NF : 4;
uint16_t Masked : 1;		uint16_t Masked : 1;
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	void RISCVDAGToDAGISel::PostprocessISelDAG() {
bool MadeChange = false;		bool MadeChange = false;
while (Position != CurDAG->allnodes_begin()) {		while (Position != CurDAG->allnodes_begin()) {
SDNode N = &--Position;		SDNode N = &--Position;
// Skip dead nodes and any non-machine opcodes.		// Skip dead nodes and any non-machine opcodes.
if (N->use_empty() \|\| !N->isMachineOpcode())		if (N->use_empty() \|\| !N->isMachineOpcode())
continue;		continue;

MadeChange \|= doPeepholeSExtW(N);		MadeChange \|= doPeepholeSExtW(N);
		MadeChange \|= doPeepholeLoadStoreADDI(N);
MadeChange \|= doPeepholeMaskedRVV(N);		MadeChange \|= doPeepholeMaskedRVV(N);
}		}

CurDAG->setRoot(Dummy.getValue());		CurDAG->setRoot(Dummy.getValue());

if (MadeChange)		if (MadeChange)
CurDAG->RemoveDeadNodes();		CurDAG->RemoveDeadNodes();
}		}

		// Returns true if N is a MachineSDNode that has a reg and a constant zero
		// memory operand. The indices of the base pointer and offset are returned in
		// BaseOpIdx and OffsetOpIdx.
		static bool hasConstantZeroMemOffset(SDNode *N, unsigned &BaseOpIdx,
		unsigned &OffsetOpIdx) {
		if (!N->isMachineOpcode())
		return false;

		switch (N->getMachineOpcode()) {
		case RISCV::LB:
		case RISCV::LH:
		case RISCV::LW:
		case RISCV::LBU:
		case RISCV::LHU:
		case RISCV::LWU:
		case RISCV::LD:
		case RISCV::FLH:
		case RISCV::FLW:
		case RISCV::FLD:
		BaseOpIdx = 0;
		OffsetOpIdx = 1;
		break;
		case RISCV::SB:
		case RISCV::SH:
		case RISCV::SW:
		case RISCV::SD:
		case RISCV::FSH:
		case RISCV::FSW:
		case RISCV::FSD:
		BaseOpIdx = 1;
		OffsetOpIdx = 2;
		break;
		default:
		return false;
		}

		if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)))
		return false;

		return (N->getConstantOperandVal(OffsetOpIdx) == 0);
		}

static SDNode selectImmSeq(SelectionDAG CurDAG, const SDLoc &DL, const MVT VT,		static SDNode selectImmSeq(SelectionDAG CurDAG, const SDLoc &DL, const MVT VT,
RISCVMatInt::InstSeq &Seq) {		RISCVMatInt::InstSeq &Seq) {
SDNode *Result = nullptr;		SDNode *Result = nullptr;
SDValue SrcReg = CurDAG->getRegister(RISCV::X0, VT);		SDValue SrcReg = CurDAG->getRegister(RISCV::X0, VT);
for (RISCVMatInt::Inst &Inst : Seq) {		for (RISCVMatInt::Inst &Inst : Seq) {
SDValue SDImm = CurDAG->getTargetConstant(Inst.Imm, DL, VT);		SDValue SDImm = CurDAG->getTargetConstant(Inst.Imm, DL, VT);
switch (Inst.getOpndKind()) {		switch (Inst.getOpndKind()) {
case RISCVMatInt::Imm:		case RISCVMatInt::Imm:
▲ Show 20 Lines • Show All 1,650 Lines • ▼ Show 20 Lines	static bool selectConstantAddr(SelectionDAG *CurDAG, const SDLoc &DL,
Seq.pop_back();		Seq.pop_back();
assert(!Seq.empty() && "Expected more instructions in sequence");		assert(!Seq.empty() && "Expected more instructions in sequence");

Base = SDValue(selectImmSeq(CurDAG, DL, VT, Seq), 0);		Base = SDValue(selectImmSeq(CurDAG, DL, VT, Seq), 0);
Offset = CurDAG->getTargetConstant(Lo12, DL, VT);		Offset = CurDAG->getTargetConstant(Lo12, DL, VT);
return true;		return true;
}		}

// Is this ADD instruction only used as the base pointer of scalar loads and		// Is this ADD/ADD_LO instruction only used as the base pointer of scalar
// stores?		// loads and stores?
static bool isWorthFoldingAdd(SDValue Add) {		static bool isWorthFoldingAdd(SDValue Add) {
for (auto Use : Add->uses()) {		for (auto Use : Add->uses()) {
if (Use->getOpcode() != ISD::LOAD && Use->getOpcode() != ISD::STORE &&		if (Use->getOpcode() != ISD::LOAD && Use->getOpcode() != ISD::STORE &&
Use->getOpcode() != ISD::ATOMIC_LOAD &&		Use->getOpcode() != ISD::ATOMIC_LOAD &&
Use->getOpcode() != ISD::ATOMIC_STORE)		Use->getOpcode() != ISD::ATOMIC_STORE)
return false;		return false;
EVT VT = cast<MemSDNode>(Use)->getMemoryVT();		EVT VT = cast<MemSDNode>(Use)->getMemoryVT();
if (!VT.isScalarInteger() && VT != MVT::f16 && VT != MVT::f32 &&		if (!VT.isScalarInteger() && VT != MVT::f16 && VT != MVT::f32 &&
Show All 14 Lines
bool RISCVDAGToDAGISel::SelectAddrRegImm(SDValue Addr, SDValue &Base,		bool RISCVDAGToDAGISel::SelectAddrRegImm(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
if (SelectAddrFrameIndex(Addr, Base, Offset))		if (SelectAddrFrameIndex(Addr, Base, Offset))
return true;		return true;

SDLoc DL(Addr);		SDLoc DL(Addr);
MVT VT = Addr.getSimpleValueType();		MVT VT = Addr.getSimpleValueType();

if (Addr.getOpcode() == RISCVISD::ADD_LO) {		// Select the Base and Offset from the ADD_LO except in the case that the
Base = Addr.getOperand(0);		// ADD_LO is used in non-memory instruction (e.g. as a base to an add) and
		// the compressed extension is present. In that case, leaving it separated
		// may increase the chance of compressing the load/store.
		if (Addr.getOpcode() == RISCVISD::ADD_LO && (!Subtarget->hasStdExtC() \|\|
		isWorthFoldingAdd(Addr))) { Base = Addr.getOperand(0);
Offset = Addr.getOperand(1);		Offset = Addr.getOperand(1);
return true;		return true;
}		}

if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
int64_t CVal = cast<ConstantSDNode>(Addr.getOperand(1))->getSExtValue();		int64_t CVal = cast<ConstantSDNode>(Addr.getOperand(1))->getSExtValue();
if (isInt<12>(CVal)) {		if (isInt<12>(CVal)) {
Base = Addr.getOperand(0);		Base = Addr.getOperand(0);
▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	if (auto *C = dyn_cast<ConstantSDNode>(N)) {

Imm = CurDAG->getTargetConstant(ImmVal, SDLoc(N), Subtarget->getXLenVT());		Imm = CurDAG->getTargetConstant(ImmVal, SDLoc(N), Subtarget->getXLenVT());
return true;		return true;
}		}

return false;		return false;
}		}

		// SelectAddrRegImm won't merge an ADD_LO into a memory operation if it has
		// uses that aren't scalar loads and stores. This will turn out to be a bad
		// decision if all those other uses end up being merged into memory
		// operations. This peephole folds the resulting ADDI back in that is the
		lukeUnsubmitted Not Done Reply Inline Actions /in that is the case/in if that is the case luke: /in that is the case/in if that is the case
		// case.
		bool RISCVDAGToDAGISel::doPeepholeLoadStoreADDI(SDNode *N) {
		unsigned OffsetOpIdx, BaseOpIdx;
		if (!hasConstantZeroMemOffset(N, BaseOpIdx, OffsetOpIdx))
		return false;

		SDValue Base = N->getOperand(BaseOpIdx);
		if (!Base.isMachineOpcode())
		return false;
		if (Base.getMachineOpcode() != RISCV::ADDI)
		return false;
		if (!isa<GlobalAddressSDNode>(Base.getOperand(1)))
		return false;
		for (auto Use : Base->uses()) {
		unsigned Dummy1, Dummy2;
		if (!hasConstantZeroMemOffset(Use, Dummy1, Dummy2))
		return false;
		}

		LLVM_DEBUG(dbgs() << "Folding add-immediate into mem-op:\nBase: ");
		LLVM_DEBUG(Base->dump(CurDAG));
		LLVM_DEBUG(dbgs() << "\nN: ");
		LLVM_DEBUG(N->dump(CurDAG));
		LLVM_DEBUG(dbgs() << "\n");

		if (BaseOpIdx == 0) { // Load
		N = CurDAG->UpdateNodeOperands(N, Base.getOperand(0), Base.getOperand(1),
		N->getOperand(2));
		} else { // Store
		N = CurDAG->UpdateNodeOperands(N, N->getOperand(0), Base.getOperand(0),
		Base.getOperand(1), N->getOperand(3));
		}

		return true;
		}

// Try to remove sext.w if the input is a W instruction or can be made into		// Try to remove sext.w if the input is a W instruction or can be made into
// a W instruction cheaply.		// a W instruction cheaply.
bool RISCVDAGToDAGISel::doPeepholeSExtW(SDNode *N) {		bool RISCVDAGToDAGISel::doPeepholeSExtW(SDNode *N) {
// Look for the sext.w pattern, addiw rd, rs1, 0.		// Look for the sext.w pattern, addiw rd, rs1, 0.
if (N->getMachineOpcode() != RISCV::ADDIW \|\|		if (N->getMachineOpcode() != RISCV::ADDIW \|\|
!isNullConstant(N->getOperand(1)))		!isNullConstant(N->getOperand(1)))
return false;		return false;

▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \		; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: \| FileCheck -check-prefix=RV32 %s		; RUN: \| FileCheck -check-prefixes=RV32,RV32I %s
		; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
		; RUN: \| FileCheck -check-prefixes=RV32,RV32C %s
; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \		; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
; RUN: \| FileCheck -check-prefix=RV64 %s		; RUN: \| FileCheck -check-prefixes=RV64,RV64I %s
		; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
		; RUN: \| FileCheck -check-prefixes=RV64,RV64C %s

; We can often fold an ADDI into the offset of load/store instructions:		; We can often fold an ADDI into the offset of load/store instructions:
; (load (addi base, off1), off2) -> (load base, off1+off2)		; (load (addi base, off1), off2) -> (load base, off1+off2)
; (store val, (addi base, off1), off2) -> (store val, base, off1+off2)		; (store val, (addi base, off1), off2) -> (store val, base, off1+off2)
; This is possible when the off1+off2 continues to fit the 12-bit immediate.		; This is possible when the off1+off2 continues to fit the 12-bit immediate.
; Check if we do the fold under various conditions. If off1 is (the low part of)		; Check if we do the fold under various conditions. If off1 is (the low part of)
; an address the fold's safety depends on the variable's alignment.		; an address the fold's safety depends on the variable's alignment.

Show All 18 Lines
; RV64-NEXT: ld a0, %lo(g_0)(a0)		; RV64-NEXT: ld a0, %lo(g_0)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
%0 = load i64, i64* @g_0		%0 = load i64, i64* @g_0
ret i64 %0		ret i64 %0
}		}

define dso_local i64 @load_g_1() nounwind {		define dso_local i64 @load_g_1() nounwind {
; RV32-LABEL: load_g_1:		; RV32I-LABEL: load_g_1:
; RV32: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32-NEXT: lui a1, %hi(g_1)		; RV32I-NEXT: lui a1, %hi(g_1)
; RV32-NEXT: lw a0, %lo(g_1)(a1)		; RV32I-NEXT: lw a0, %lo(g_1)(a1)
; RV32-NEXT: addi a1, a1, %lo(g_1)		; RV32I-NEXT: addi a1, a1, %lo(g_1)
; RV32-NEXT: lw a1, 4(a1)		; RV32I-NEXT: lw a1, 4(a1)
; RV32-NEXT: ret		; RV32I-NEXT: ret
		;
		; RV32C-LABEL: load_g_1:
		; RV32C: # %bb.0: # %entry
		; RV32C-NEXT: lui a0, %hi(g_1)
		; RV32C-NEXT: addi a1, a0, %lo(g_1)
		; RV32C-NEXT: lw a0, 0(a1)
		; RV32C-NEXT: lw a1, 4(a1)
		; RV32C-NEXT: ret
;		;
; RV64-LABEL: load_g_1:		; RV64-LABEL: load_g_1:
; RV64: # %bb.0: # %entry		; RV64: # %bb.0: # %entry
; RV64-NEXT: lui a0, %hi(g_1)		; RV64-NEXT: lui a0, %hi(g_1)
; RV64-NEXT: ld a0, %lo(g_1)(a0)		; RV64-NEXT: ld a0, %lo(g_1)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
%0 = load i64, i64* @g_1		%0 = load i64, i64* @g_1
ret i64 %0		ret i64 %0
}		}

define dso_local i64 @load_g_2() nounwind {		define dso_local i64 @load_g_2() nounwind {
; RV32-LABEL: load_g_2:		; RV32I-LABEL: load_g_2:
; RV32: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32-NEXT: lui a1, %hi(g_2)		; RV32I-NEXT: lui a1, %hi(g_2)
; RV32-NEXT: lw a0, %lo(g_2)(a1)		; RV32I-NEXT: lw a0, %lo(g_2)(a1)
; RV32-NEXT: addi a1, a1, %lo(g_2)		; RV32I-NEXT: addi a1, a1, %lo(g_2)
; RV32-NEXT: lw a1, 4(a1)		; RV32I-NEXT: lw a1, 4(a1)
; RV32-NEXT: ret		; RV32I-NEXT: ret
		;
		; RV32C-LABEL: load_g_2:
		; RV32C: # %bb.0: # %entry
		; RV32C-NEXT: lui a0, %hi(g_2)
		; RV32C-NEXT: addi a1, a0, %lo(g_2)
		; RV32C-NEXT: lw a0, 0(a1)
		; RV32C-NEXT: lw a1, 4(a1)
		reamesUnsubmitted Not Done Reply Inline Actions Looking at your test here, there's something interesting lurking. You probably already knew this, but it took me a while staring at the code to figure this out. We're materializing two addresses here. Those addresses are known to be four bytes apart, but we do not know that e.g. the high bits are common. The two addresses could e.g. straddle a 12 bit aligned boundary, and thus the high bits of the second may not match those of the first. If we did know this fact, we could use the common high bits and fold both low parts into loads. (Your load_g_8 actually covers exactly that.) So, conceptually, the Rv32 split case does look exactly like two merged i32 globals. Now, I do have to say that this test case may not matter. The RV32 split i64 case should be 8 byte aligned, and thus not hit this. I do see why this is interesting from a global merging perspective though. Coming at this a bit differently, are we allowed to increase the alignment of the merged global? If we are, we might make cases look like more load_g_8 than load_g_1. Another idea: Do we have any way to express the opposite of alignment? That is, we don't care that an address is aligned unless the result would cross some boundary? I remember we have such a concept for instructions (e.g. boundary align, thanks JCC errata!), but if could spell something like that for globals, then we could leave their addresses unaligned unless they crossed the boundary which allows the high bits to differ. Haven't explored that much, so not sure what we have here. reames: Looking at your test here, there's something interesting lurking. You probably already knew…
		asbAuthorUnsubmitted Not Done Reply Inline Actions Coming at this a bit differently, are we allowed to increase the alignment of the merged global? If we are, we might make cases look like more load_g_8 than load_g_1. That seems like a promising idea. I'm not aware of anything that would prevent us from increasing alignment. GlobalMerge currently sets the alignment of the merged global to the maximum alignment of its merged members, and maintains alignment for the members using a struct and adding padding as necessary - I think additional padding could just be added. Re the second idea: I don't think I've seen anything that would give that so far, but perhaps I've missed it. And as you suggest, there's some precedent elsewhere. asb: > Coming at this a bit differently, are we allowed to increase the alignment of the merged…
		; RV32C-NEXT: ret
;		;
; RV64-LABEL: load_g_2:		; RV64-LABEL: load_g_2:
; RV64: # %bb.0: # %entry		; RV64: # %bb.0: # %entry
; RV64-NEXT: lui a0, %hi(g_2)		; RV64-NEXT: lui a0, %hi(g_2)
; RV64-NEXT: ld a0, %lo(g_2)(a0)		; RV64-NEXT: ld a0, %lo(g_2)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
%0 = load i64, i64* @g_2		%0 = load i64, i64* @g_2
ret i64 %0		ret i64 %0
}		}

define dso_local i64 @load_g_4() nounwind {		define dso_local i64 @load_g_4() nounwind {
; RV32-LABEL: load_g_4:		; RV32I-LABEL: load_g_4:
; RV32: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32-NEXT: lui a1, %hi(g_4)		; RV32I-NEXT: lui a1, %hi(g_4)
; RV32-NEXT: lw a0, %lo(g_4)(a1)		; RV32I-NEXT: lw a0, %lo(g_4)(a1)
; RV32-NEXT: addi a1, a1, %lo(g_4)		; RV32I-NEXT: addi a1, a1, %lo(g_4)
; RV32-NEXT: lw a1, 4(a1)		; RV32I-NEXT: lw a1, 4(a1)
; RV32-NEXT: ret		; RV32I-NEXT: ret
		;
		; RV32C-LABEL: load_g_4:
		; RV32C: # %bb.0: # %entry
		; RV32C-NEXT: lui a0, %hi(g_4)
		; RV32C-NEXT: addi a1, a0, %lo(g_4)
		; RV32C-NEXT: lw a0, 0(a1)
		; RV32C-NEXT: lw a1, 4(a1)
		; RV32C-NEXT: ret
;		;
; RV64-LABEL: load_g_4:		; RV64-LABEL: load_g_4:
; RV64: # %bb.0: # %entry		; RV64: # %bb.0: # %entry
; RV64-NEXT: lui a0, %hi(g_4)		; RV64-NEXT: lui a0, %hi(g_4)
; RV64-NEXT: ld a0, %lo(g_4)(a0)		; RV64-NEXT: ld a0, %lo(g_4)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
%0 = load i64, i64* @g_4		%0 = load i64, i64* @g_4
Show All 32 Lines
; RV64-NEXT: ld a0, %lo(g_16)(a0)		; RV64-NEXT: ld a0, %lo(g_16)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
%0 = load i64, i64* @g_16		%0 = load i64, i64* @g_16
ret i64 %0		ret i64 %0
}		}

define dso_local void @store_g_4() nounwind {		define dso_local void @store_g_4() nounwind {
; RV32-LABEL: store_g_4:		; RV32I-LABEL: store_g_4:
; RV32: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32-NEXT: lui a0, %hi(g_4)		; RV32I-NEXT: lui a0, %hi(g_4)
; RV32-NEXT: sw zero, %lo(g_4)(a0)		; RV32I-NEXT: sw zero, %lo(g_4)(a0)
; RV32-NEXT: addi a0, a0, %lo(g_4)		; RV32I-NEXT: addi a0, a0, %lo(g_4)
; RV32-NEXT: sw zero, 4(a0)		; RV32I-NEXT: sw zero, 4(a0)
; RV32-NEXT: ret		; RV32I-NEXT: ret
		;
		; RV32C-LABEL: store_g_4:
		; RV32C: # %bb.0: # %entry
		; RV32C-NEXT: lui a0, %hi(g_4)
		; RV32C-NEXT: addi a0, a0, %lo(g_4)
		; RV32C-NEXT: sw zero, 4(a0)
		; RV32C-NEXT: sw zero, 0(a0)
		; RV32C-NEXT: ret
;		;
; RV64-LABEL: store_g_4:		; RV64-LABEL: store_g_4:
; RV64: # %bb.0: # %entry		; RV64: # %bb.0: # %entry
; RV64-NEXT: lui a0, %hi(g_4)		; RV64-NEXT: lui a0, %hi(g_4)
; RV64-NEXT: sd zero, %lo(g_4)(a0)		; RV64-NEXT: sd zero, %lo(g_4)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
store i64 0, i64* @g_4		store i64 0, i64* @g_4
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	entry:
ret i64 %0		ret i64 %0
}		}

; Check for folds in accesses to multiple array offsets.		; Check for folds in accesses to multiple array offsets.

@ga32 = dso_local global [4 x i32] zeroinitializer, align 4		@ga32 = dso_local global [4 x i32] zeroinitializer, align 4

define dso_local i32 @load_ga32_multi() nounwind {		define dso_local i32 @load_ga32_multi() nounwind {
; RV32-LABEL: load_ga32_multi:		; RV32I-LABEL: load_ga32_multi:
; RV32: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32-NEXT: lui a0, %hi(ga32)		; RV32I-NEXT: lui a0, %hi(ga32)
; RV32-NEXT: lw a1, %lo(ga32)(a0)		; RV32I-NEXT: lw a1, %lo(ga32)(a0)
; RV32-NEXT: addi a0, a0, %lo(ga32)		; RV32I-NEXT: addi a0, a0, %lo(ga32)
; RV32-NEXT: lw a0, 4(a0)		; RV32I-NEXT: lw a0, 4(a0)
; RV32-NEXT: add a0, a1, a0		; RV32I-NEXT: add a0, a1, a0
; RV32-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64-LABEL: load_ga32_multi:		; RV32C-LABEL: load_ga32_multi:
; RV64: # %bb.0: # %entry		; RV32C: # %bb.0: # %entry
; RV64-NEXT: lui a0, %hi(ga32)		; RV32C-NEXT: lui a0, %hi(ga32)
; RV64-NEXT: lw a1, %lo(ga32)(a0)		; RV32C-NEXT: addi a0, a0, %lo(ga32)
; RV64-NEXT: addi a0, a0, %lo(ga32)		; RV32C-NEXT: lw a1, 0(a0)
; RV64-NEXT: lw a0, 4(a0)		; RV32C-NEXT: lw a0, 4(a0)
; RV64-NEXT: addw a0, a1, a0		; RV32C-NEXT: add a0, a0, a1
; RV64-NEXT: ret		; RV32C-NEXT: ret
		;
		; RV64I-LABEL: load_ga32_multi:
		; RV64I: # %bb.0: # %entry
		; RV64I-NEXT: lui a0, %hi(ga32)
		; RV64I-NEXT: lw a1, %lo(ga32)(a0)
		; RV64I-NEXT: addi a0, a0, %lo(ga32)
		; RV64I-NEXT: lw a0, 4(a0)
		; RV64I-NEXT: addw a0, a1, a0
		; RV64I-NEXT: ret
		;
		; RV64C-LABEL: load_ga32_multi:
		; RV64C: # %bb.0: # %entry
		; RV64C-NEXT: lui a0, %hi(ga32)
		; RV64C-NEXT: addi a0, a0, %lo(ga32)
		; RV64C-NEXT: lw a1, 0(a0)
		; RV64C-NEXT: lw a0, 4(a0)
		; RV64C-NEXT: addw a0, a0, a1
		; RV64C-NEXT: ret
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @ga32, i32 0, i32 0)		%0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @ga32, i32 0, i32 0)
%1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @ga32, i32 0, i32 1)		%1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @ga32, i32 0, i32 1)
%2 = add i32 %0, %1		%2 = add i32 %0, %1
ret i32 %2		ret i32 %2
}		}

; Check for folds in accesses to thread-local variables.		; Check for folds in accesses to thread-local variables.

@tl_4 = dso_local thread_local global i64 0, align 4		@tl_4 = dso_local thread_local global i64 0, align 4
@tl_8 = dso_local thread_local global i64 0, align 8		@tl_8 = dso_local thread_local global i64 0, align 8

define dso_local i64 @load_tl_4() nounwind {		define dso_local i64 @load_tl_4() nounwind {
; RV32-LABEL: load_tl_4:		; RV32I-LABEL: load_tl_4:
; RV32: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32-NEXT: lui a0, %tprel_hi(tl_4)		; RV32I-NEXT: lui a0, %tprel_hi(tl_4)
; RV32-NEXT: add a1, a0, tp, %tprel_add(tl_4)		; RV32I-NEXT: add a1, a0, tp, %tprel_add(tl_4)
; RV32-NEXT: lw a0, %tprel_lo(tl_4)(a1)		; RV32I-NEXT: lw a0, %tprel_lo(tl_4)(a1)
; RV32-NEXT: addi a1, a1, %tprel_lo(tl_4)		; RV32I-NEXT: addi a1, a1, %tprel_lo(tl_4)
; RV32-NEXT: lw a1, 4(a1)		; RV32I-NEXT: lw a1, 4(a1)
; RV32-NEXT: ret		; RV32I-NEXT: ret
		;
		; RV32C-LABEL: load_tl_4:
		; RV32C: # %bb.0: # %entry
		; RV32C-NEXT: lui a0, %tprel_hi(tl_4)
		; RV32C-NEXT: add a0, a0, tp, %tprel_add(tl_4)
		; RV32C-NEXT: addi a1, a0, %tprel_lo(tl_4)
		; RV32C-NEXT: lw a0, 0(a1)
		; RV32C-NEXT: lw a1, 4(a1)
		; RV32C-NEXT: ret
;		;
; RV64-LABEL: load_tl_4:		; RV64-LABEL: load_tl_4:
; RV64: # %bb.0: # %entry		; RV64: # %bb.0: # %entry
; RV64-NEXT: lui a0, %tprel_hi(tl_4)		; RV64-NEXT: lui a0, %tprel_hi(tl_4)
; RV64-NEXT: add a0, a0, tp, %tprel_add(tl_4)		; RV64-NEXT: add a0, a0, tp, %tprel_add(tl_4)
; RV64-NEXT: ld a0, %tprel_lo(tl_4)(a0)		; RV64-NEXT: ld a0, %tprel_lo(tl_4)(a0)
; RV64-NEXT: ret		; RV64-NEXT: ret
entry:		entry:
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/global-merge-minsize.ll

	Show All 19 Lines
	; CHECK-NEXT: lui a1, %hi(eg2)			; CHECK-NEXT: lui a1, %hi(eg2)
	; CHECK-NEXT: sw a0, %lo(eg2)(a1)			; CHECK-NEXT: sw a0, %lo(eg2)(a1)
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	store i32 %a, ptr @eg1, align 4			store i32 %a, ptr @eg1, align 4
	store i32 %a, ptr @eg2, align 4			store i32 %a, ptr @eg2, align 4
	ret void			ret void
	}			}

	; TODO: It would be better for code size to alter the first store below by
	; first fully materialising .L_MergedGlobals in a1 and then storing to it with
	; a 0 offset.

	define void @f2(i32 %a) nounwind minsize optsize {			define void @f2(i32 %a) nounwind minsize optsize {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lui a1, %hi(.L_MergedGlobals)			; CHECK-NEXT: lui a1, %hi(.L_MergedGlobals)
	; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)			; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)
	; CHECK-NEXT: addi a1, a1, %lo(.L_MergedGlobals)			; CHECK-NEXT: addi a1, a1, %lo(.L_MergedGlobals)
	; CHECK-NEXT: sw a0, 4(a1)			; CHECK-NEXT: sw a0, 4(a1)
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	store i32 %a, ptr @eg3, align 4			store i32 %a, ptr @eg3, align 4
	store i32 %a, ptr @eg4, align 4			store i32 %a, ptr @eg4, align 4
	ret void			ret void
	}			}

llvm/test/CodeGen/RISCV/global-merge-offset.ll

	Show All 9 Lines

	; This test demonstrates that the MaxOffset is set correctly for RISC-V by			; This test demonstrates that the MaxOffset is set correctly for RISC-V by
	; constructing an input that is at the limit and comparing.			; constructing an input that is at the limit and comparing.

	@ga1 = dso_local global [410 x i32] zeroinitializer, align 4			@ga1 = dso_local global [410 x i32] zeroinitializer, align 4
	@ga2 = dso_local global [ArrSize x i32] zeroinitializer, align 4			@ga2 = dso_local global [ArrSize x i32] zeroinitializer, align 4
	@gi = dso_local global i32 0, align 4			@gi = dso_local global i32 0, align 4

	; TODO: It would be better for codesize if the final store below was
	; `sw a0, 0(a2)`.

	define void @f1(i32 %a) nounwind {			define void @f1(i32 %a) nounwind {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lui a1, %hi(.L_MergedGlobals)			; CHECK-NEXT: lui a1, %hi(.L_MergedGlobals)
	; CHECK-NEXT: addi a2, a1, %lo(.L_MergedGlobals)			; CHECK-NEXT: addi a2, a1, %lo(.L_MergedGlobals)
	; CHECK-NEXT: sw a0, 2044(a2)			; CHECK-NEXT: sw a0, 2044(a2)
	; CHECK-NEXT: sw a0, 404(a2)			; CHECK-NEXT: sw a0, 404(a2)
	; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)			; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)
	Show All 18 Lines

llvm/test/CodeGen/RISCV/global-merge.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -riscv-enable-global-merge -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -riscv-enable-global-merge -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s
	; RUN: llc -mtriple=riscv64 -riscv-enable-global-merge -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -riscv-enable-global-merge -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s

	@ig1 = internal global i32 0, align 4			@ig1 = internal global i32 0, align 4
	@ig2 = internal global i32 0, align 4			@ig2 = internal global i32 0, align 4

	@eg1 = dso_local global i32 0, align 4			@eg1 = dso_local global i32 0, align 4
	@eg2 = dso_local global i32 0, align 4			@eg2 = dso_local global i32 0, align 4

	; TODO: It would be better for code size to alter the first store below by
	; first fully materialising .L_MergedGlobals in a1 and then storing to it with
	; a 0 offset.

	define void @f1(i32 %a) nounwind {			define void @f1(i32 %a) nounwind {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lui a1, %hi(.L_MergedGlobals)			; CHECK-NEXT: lui a1, %hi(.L_MergedGlobals)
	; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)			; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)
	; CHECK-NEXT: addi a1, a1, %lo(.L_MergedGlobals)			; CHECK-NEXT: addi a1, a1, %lo(.L_MergedGlobals)
	; CHECK-NEXT: sw a0, 4(a1)			; CHECK-NEXT: sw a0, 4(a1)
	; CHECK-NEXT: sw a0, 8(a1)			; CHECK-NEXT: sw a0, 8(a1)
	; CHECK-NEXT: sw a0, 12(a1)			; CHECK-NEXT: sw a0, 12(a1)
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	store i32 %a, ptr @ig1, align 4			store i32 %a, ptr @ig1, align 4
	store i32 %a, ptr @ig2, align 4			store i32 %a, ptr @ig2, align 4
	store i32 %a, ptr @eg1, align 4			store i32 %a, ptr @eg1, align 4
	store i32 %a, ptr @eg2, align 4			store i32 %a, ptr @eg2, align 4
	ret void			ret void
	}			}