Download Raw Diff

Details

Reviewers

asb
lenary

Commits

rG61c2a0bb8236: [RISCV] Fold ADDIs into load/stores with nonzero offsets

Summary

We can often fold an ADDI into the offset of load/store instructions:

(load (addi base, off1), off2) -> (load base, off1+off2)
(store val, (addi base, off1), off2) -> (store val, base, off1+off2)

This is possible when the off1+off2 continues to fit the 12-bit immediate. We remove the previous restriction where we would never fold the ADDIs if the load/stores had nonzero offsets. We now do the fold the the resulting constant still fits a 12-bit immediate, or if off1 is a variable's address and we know based on that variable's alignment that off1+offs2 won't overflow. The first case doesn't seem to currently be exercised by the backend, but the code change is simple and easy to reason about, and handling it specially was actually making the code and the surrounding comments harder to understand.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

luismarques created this revision.May 10 2020, 11:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2020, 11:21 AM

Herald added subscribers: llvm-commits, evandro, apazos and 25 others. · View Herald Transcript

luismarques added a parent revision: D79689: [RISCV][NFC] Add tests for folds of ADDIs into load/stores.May 10 2020, 11:24 AM

Harbormaster failed remote builds in B56266: Diff 263077!May 10 2020, 11:43 AM

This is looking like a good improvement, thanks @luismarques!

I just have one question about the ConstantSDNode part of the change.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
219–224	I don't understand where this code is being tested. Can you point to a testcase that changes because of this change?

luismarques marked an inline comment as done.May 11 2020, 3:39 AM

luismarques added inline comments.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
219–224	That's the situation I mention in the review summary. That case doesn't seem to currently be generated by LLVM, so I couldn't generate a test that would be impacted by it. I can try to look harder or see if e.g. a MIR test could cover that case. Since the code was reasonably straightforward and the alternative (skipping on 0) was actually being harder to document in the code comments I kept the optimization, since it should be reasonably clear to reason about it being correct or not, despite the lack of specific test.

Could you rebase please? This isn't applying cleanly for me on current master.

Rebase. Now also handles the case of ConstantPoolSDNode.

Harbormaster failed remote builds in B56575: Diff 263680!May 13 2020, 6:27 AM

lenary added inline comments.May 15 2020, 5:28 AM

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
219–224	I though I had come up with two testcases for this, but I see I haven't, because LLVM never makes a selectiondag the equivalent of: t0: i32 = LUI <ConstAHi> t1: i32 = ADDI t0, <ConstALo> t2: i32 = LW t1, 0 t3: i32 = LW t1, 4 instead it creates: t0: i32 = LUI <ConstAHi> t1: i32 = ADDI t0, <ConstALo> t2: i32 = LW t1, 0 t3: i32 = LUI <ConstBHi> t4: i32 = ADDI <ConstBLo> t5: i32 = LW t4, 0 This means you still need to do this fold, but the case where the offset to the `LW` not being zero is not exercised by LLVM. In any case, if you want it, here are two testcases, which. show you cannot skip the fold in the ConstantSDNode case, but also that your additions to the ConstantSDNode are not tested: define i64 @load_const_ok() nounwind { entry: %0 = load i64, i64* inttoptr (i32 2040 to i64) ret i64 %0 } define i64 @load_cost_overflow() nounwind { entry: %0 = load i64, i64 inttoptr (i64 2044 to i64*) ret i64 %0 } I'm not sure we test this kind of folding anywhere else, so I think we should add this to the testcases you add for this PR, even though there are no changes to the generated assembly after your patch.

efriedma added a subscriber: efriedma.Jun 23 2020, 7:22 PM

efriedma added inline comments.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
230	FYI, D80368 removes GlobalValue::getAlignment(). Value::getPointerAlignment() is the suggested replacement.

Handle removal of GlobalValue::getAlignment.

luismarques marked an inline comment as done.Jun 24 2020, 10:10 AM

luismarques added inline comments.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
230	Thanks for the heads-up. I guess in this case we want `GlobalObject::getAlignment()`? If I understand correctly, that's the one that gives the alignment of the variable itself, which is what we need to rely upon to obtain the margin of safety before the 12-bit immediate overflow can occur.

efriedma added inline comments.Jun 24 2020, 11:25 AM

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
230	`GlobalObject::getAlignment()` is deprecated in favor of `GlobalObject::getAlign()`. Using `GlobalObject::getAlign()` isn't wrong, but it's probably not the best approach. In this context, probably the biggest issue is that it can return None; you can handle that conservatively, but handling it correctly is sort of tricky. `getPointerAlignment()` wraps up all the necessary logic in a single call.

Harbormaster failed remote builds in B61575: Diff 273080!Jun 24 2020, 12:28 PM

Use getPointerAlignment instead.

Harbormaster completed remote builds in B61608: Diff 273131.Jun 24 2020, 2:41 PM

LGTM!

This revision is now accepted and ready to land.Jul 1 2020, 7:15 AM

Closed by commit rG61c2a0bb8236: [RISCV] Fold ADDIs into load/stores with nonzero offsets (authored by luismarques). · Explain WhyJul 6 2020, 9:35 AM

This revision was automatically updated to reflect the committed changes.

Diff 263077

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	bool RISCVDAGToDAGISel::SelectAddrFI(SDValue Addr, SDValue &Base) {
if (auto FIN = dyn_cast<FrameIndexSDNode>(Addr)) {		if (auto FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
Base = CurDAG->getTargetFrameIndex(FIN->getIndex(), Subtarget->getXLenVT());		Base = CurDAG->getTargetFrameIndex(FIN->getIndex(), Subtarget->getXLenVT());
return true;		return true;
}		}
return false;		return false;
}		}

// Merge an ADDI into the offset of a load/store instruction where possible.		// Merge an ADDI into the offset of a load/store instruction where possible.
// (load (add base, off), 0) -> (load base, off)		// (load (addi base, off1), off2) -> (load base, off1+off2)
// (store val, (add base, off)) -> (store val, base, off)		// (store val, (addi base, off1), off2) -> (store val, base, off1+off2)
		// This is possible when off1+off2 fits a 12-bit immediate.
void RISCVDAGToDAGISel::doPeepholeLoadStoreADDI() {		void RISCVDAGToDAGISel::doPeepholeLoadStoreADDI() {
SelectionDAG::allnodes_iterator Position(CurDAG->getRoot().getNode());		SelectionDAG::allnodes_iterator Position(CurDAG->getRoot().getNode());
++Position;		++Position;

while (Position != CurDAG->allnodes_begin()) {		while (Position != CurDAG->allnodes_begin()) {
SDNode N = &--Position;		SDNode N = &--Position;
// Skip dead nodes and any non-machine opcodes.		// Skip dead nodes and any non-machine opcodes.
if (N->use_empty() \|\| !N->isMachineOpcode())		if (N->use_empty() \|\| !N->isMachineOpcode())
Show All 24 Lines	while (Position != CurDAG->allnodes_begin()) {
case RISCV::SD:		case RISCV::SD:
case RISCV::FSW:		case RISCV::FSW:
case RISCV::FSD:		case RISCV::FSD:
BaseOpIdx = 1;		BaseOpIdx = 1;
OffsetOpIdx = 2;		OffsetOpIdx = 2;
break;		break;
}		}

// Currently, the load/store offset must be 0 to be considered for this		if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)))
// peephole optimisation.
if (!isa<ConstantSDNode>(N->getOperand(OffsetOpIdx)) \|\|
N->getConstantOperandVal(OffsetOpIdx) != 0)
continue;		continue;

SDValue Base = N->getOperand(BaseOpIdx);		SDValue Base = N->getOperand(BaseOpIdx);

// If the base is an ADDI, we can merge it in to the load/store.		// If the base is an ADDI, we can merge it in to the load/store.
if (!Base.isMachineOpcode() \|\| Base.getMachineOpcode() != RISCV::ADDI)		if (!Base.isMachineOpcode() \|\| Base.getMachineOpcode() != RISCV::ADDI)
continue;		continue;

SDValue ImmOperand = Base.getOperand(1);		SDValue ImmOperand = Base.getOperand(1);
		uint64_t Offset2 = N->getConstantOperandVal(OffsetOpIdx);

if (auto Const = dyn_cast<ConstantSDNode>(ImmOperand)) {		if (auto Const = dyn_cast<ConstantSDNode>(ImmOperand)) {
ImmOperand = CurDAG->getTargetConstant(		int64_t Offset1 = Const->getSExtValue();
Const->getSExtValue(), SDLoc(ImmOperand), ImmOperand.getValueType());		int64_t CombinedOffset = Offset1 + Offset2;
		if (!isInt<12>(CombinedOffset))
		continue;
		ImmOperand = CurDAG->getTargetConstant(CombinedOffset, SDLoc(ImmOperand),
		ImmOperand.getValueType());
		lenaryUnsubmitted Not Done Reply Inline Actions I don't understand where this code is being tested. Can you point to a testcase that changes because of this change? lenary: I don't understand where this code is being tested. Can you point to a testcase that changes…
		luismarquesAuthorUnsubmitted Done Reply Inline Actions That's the situation I mention in the review summary. That case doesn't seem to currently be generated by LLVM, so I couldn't generate a test that would be impacted by it. I can try to look harder or see if e.g. a MIR test could cover that case. Since the code was reasonably straightforward and the alternative (skipping on 0) was actually being harder to document in the code comments I kept the optimization, since it should be reasonably clear to reason about it being correct or not, despite the lack of specific test. luismarques: That's the situation I mention in the review summary. That case doesn't seem to currently be…
		lenaryUnsubmitted Not Done Reply Inline Actions I though I had come up with two testcases for this, but I see I haven't, because LLVM never makes a selectiondag the equivalent of: t0: i32 = LUI <ConstAHi> t1: i32 = ADDI t0, <ConstALo> t2: i32 = LW t1, 0 t3: i32 = LW t1, 4 instead it creates: t0: i32 = LUI <ConstAHi> t1: i32 = ADDI t0, <ConstALo> t2: i32 = LW t1, 0 t3: i32 = LUI <ConstBHi> t4: i32 = ADDI <ConstBLo> t5: i32 = LW t4, 0 This means you still need to do this fold, but the case where the offset to the `LW` not being zero is not exercised by LLVM. In any case, if you want it, here are two testcases, which. show you cannot skip the fold in the ConstantSDNode case, but also that your additions to the ConstantSDNode are not tested: define i64 @load_const_ok() nounwind { entry: %0 = load i64, i64* inttoptr (i32 2040 to i64) ret i64 %0 } define i64 @load_cost_overflow() nounwind { entry: %0 = load i64, i64 inttoptr (i64 2044 to i64) ret i64 %0 } I'm not sure we test this kind of folding anywhere else, so I think we should add this to the testcases you add for this PR, even though there are no changes to the generated assembly after your patch. lenary:* I though I had come up with two testcases for this, but I see I haven't, because LLVM never…
} else if (auto GA = dyn_cast<GlobalAddressSDNode>(ImmOperand)) {		} else if (auto GA = dyn_cast<GlobalAddressSDNode>(ImmOperand)) {
		// If the off1 in (addi base, off1) is a global variable's address (its
		// low part, really), then we can rely on the alignment of that variable
		// to provide a margin of safety before off1 can overflow the 12 bits.
		// Check if off2 falls within that margin; if so off1+off2 can't overflow.
		unsigned Alignment = GA->getGlobal()->getAlignment();
		efriedmaUnsubmitted Not Done Reply Inline Actions FYI, D80368 removes GlobalValue::getAlignment(). Value::getPointerAlignment() is the suggested replacement. efriedma: FYI, D80368 removes GlobalValue::getAlignment(). Value::getPointerAlignment() is the suggested…
		luismarquesAuthorUnsubmitted Done Reply Inline Actions Thanks for the heads-up. I guess in this case we want `GlobalObject::getAlignment()`? If I understand correctly, that's the one that gives the alignment of the variable itself, which is what we need to rely upon to obtain the margin of safety before the 12-bit immediate overflow can occur. luismarques: Thanks for the heads-up. I guess in this case we want `GlobalObject::getAlignment()`? If I…
		efriedmaUnsubmitted Not Done Reply Inline Actions `GlobalObject::getAlignment()` is deprecated in favor of `GlobalObject::getAlign()`. Using `GlobalObject::getAlign()` isn't wrong, but it's probably not the best approach. In this context, probably the biggest issue is that it can return None; you can handle that conservatively, but handling it correctly is sort of tricky. `getPointerAlignment()` wraps up all the necessary logic in a single call. efriedma: `GlobalObject::getAlignment()` is deprecated in favor of `GlobalObject::getAlign()`. Using…
		if (Offset2 != 0 && Offset2 >= Alignment)
		continue;
		int64_t Offset1 = GA->getOffset();
		int64_t CombinedOffset = Offset1 + Offset2;
ImmOperand = CurDAG->getTargetGlobalAddress(		ImmOperand = CurDAG->getTargetGlobalAddress(
GA->getGlobal(), SDLoc(ImmOperand), ImmOperand.getValueType(),		GA->getGlobal(), SDLoc(ImmOperand), ImmOperand.getValueType(),
GA->getOffset(), GA->getTargetFlags());		CombinedOffset, GA->getTargetFlags());
} else {		} else {
continue;		continue;
}		}

LLVM_DEBUG(dbgs() << "Folding add-immediate into mem-op:\nBase: ");		LLVM_DEBUG(dbgs() << "Folding add-immediate into mem-op:\nBase: ");
LLVM_DEBUG(Base->dump(CurDAG));		LLVM_DEBUG(Base->dump(CurDAG));
LLVM_DEBUG(dbgs() << "\nN: ");		LLVM_DEBUG(dbgs() << "\nN: ");
LLVM_DEBUG(N->dump(CurDAG));		LLVM_DEBUG(N->dump(CurDAG));
Show All 21 Lines

llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	entry:
ret i64 %0		ret i64 %0
}		}

define i64 @load_g_8() nounwind {		define i64 @load_g_8() nounwind {
; RV32I-LABEL: load_g_8:		; RV32I-LABEL: load_g_8:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a1, %hi(g_8)		; RV32I-NEXT: lui a1, %hi(g_8)
; RV32I-NEXT: lw a0, %lo(g_8)(a1)		; RV32I-NEXT: lw a0, %lo(g_8)(a1)
; RV32I-NEXT: addi a1, a1, %lo(g_8)		; RV32I-NEXT: lw a1, %lo(g_8+4)(a1)
; RV32I-NEXT: lw a1, 4(a1)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: load_g_8:		; RV64I-LABEL: load_g_8:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: lui a0, %hi(g_8)		; RV64I-NEXT: lui a0, %hi(g_8)
; RV64I-NEXT: ld a0, %lo(g_8)(a0)		; RV64I-NEXT: ld a0, %lo(g_8)(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
%0 = load i64, i64* @g_8		%0 = load i64, i64* @g_8
ret i64 %0		ret i64 %0
}		}

define i64 @load_g_16() nounwind {		define i64 @load_g_16() nounwind {
; RV32I-LABEL: load_g_16:		; RV32I-LABEL: load_g_16:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a1, %hi(g_16)		; RV32I-NEXT: lui a1, %hi(g_16)
; RV32I-NEXT: lw a0, %lo(g_16)(a1)		; RV32I-NEXT: lw a0, %lo(g_16)(a1)
; RV32I-NEXT: addi a1, a1, %lo(g_16)		; RV32I-NEXT: lw a1, %lo(g_16+4)(a1)
; RV32I-NEXT: lw a1, 4(a1)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: load_g_16:		; RV64I-LABEL: load_g_16:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: lui a0, %hi(g_16)		; RV64I-NEXT: lui a0, %hi(g_16)
; RV64I-NEXT: ld a0, %lo(g_16)(a0)		; RV64I-NEXT: ld a0, %lo(g_16)(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
Show All 19 Lines	entry:
store i64 0, i64* @g_4		store i64 0, i64* @g_4
ret void		ret void
}		}

define void @store_g_8() nounwind {		define void @store_g_8() nounwind {
; RV32I-LABEL: store_g_8:		; RV32I-LABEL: store_g_8:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a0, %hi(g_8)		; RV32I-NEXT: lui a0, %hi(g_8)
		; RV32I-NEXT: sw zero, %lo(g_8+4)(a0)
; RV32I-NEXT: sw zero, %lo(g_8)(a0)		; RV32I-NEXT: sw zero, %lo(g_8)(a0)
; RV32I-NEXT: addi a0, a0, %lo(g_8)
; RV32I-NEXT: sw zero, 4(a0)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: store_g_8:		; RV64I-LABEL: store_g_8:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: lui a0, %hi(g_8)		; RV64I-NEXT: lui a0, %hi(g_8)
; RV64I-NEXT: sd zero, %lo(g_8)(a0)		; RV64I-NEXT: sd zero, %lo(g_8)(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
Show All 23 Lines
entry:		entry:
%0 = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @ga_8, i32 0, i32 1)		%0 = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @ga_8, i32 0, i32 1)
ret i64 %0		ret i64 %0
}		}

define i64 @load_ga_16() nounwind {		define i64 @load_ga_16() nounwind {
; RV32I-LABEL: load_ga_16:		; RV32I-LABEL: load_ga_16:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a0, %hi(ga_16)		; RV32I-NEXT: lui a1, %hi(ga_16)
; RV32I-NEXT: addi a1, a0, %lo(ga_16)		; RV32I-NEXT: lw a0, %lo(ga_16+8)(a1)
; RV32I-NEXT: lw a0, 8(a1)		; RV32I-NEXT: lw a1, %lo(ga_16+12)(a1)
; RV32I-NEXT: lw a1, 12(a1)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: load_ga_16:		; RV64I-LABEL: load_ga_16:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: lui a0, %hi(ga_16+8)		; RV64I-NEXT: lui a0, %hi(ga_16)
; RV64I-NEXT: ld a0, %lo(ga_16+8)(a0)		; RV64I-NEXT: ld a0, %lo(ga_16+8)(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
%0 = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @ga_16, i32 0, i32 1)		%0 = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @ga_16, i32 0, i32 1)
ret i64 %0		ret i64 %0
}		}

; Check for folds in accesses to thread-local variables.		; Check for folds in accesses to thread-local variables.
Show All 23 Lines
}		}

define i64 @load_tl_8() nounwind {		define i64 @load_tl_8() nounwind {
; RV32I-LABEL: load_tl_8:		; RV32I-LABEL: load_tl_8:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a0, %tprel_hi(tl_8)		; RV32I-NEXT: lui a0, %tprel_hi(tl_8)
; RV32I-NEXT: add a1, a0, tp, %tprel_add(tl_8)		; RV32I-NEXT: add a1, a0, tp, %tprel_add(tl_8)
; RV32I-NEXT: lw a0, %tprel_lo(tl_8)(a1)		; RV32I-NEXT: lw a0, %tprel_lo(tl_8)(a1)
; RV32I-NEXT: addi a1, a1, %tprel_lo(tl_8)		; RV32I-NEXT: lw a1, %tprel_lo(tl_8+4)(a1)
; RV32I-NEXT: lw a1, 4(a1)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: load_tl_8:		; RV64I-LABEL: load_tl_8:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: lui a0, %tprel_hi(tl_8)		; RV64I-NEXT: lui a0, %tprel_hi(tl_8)
; RV64I-NEXT: add a0, a0, tp, %tprel_add(tl_8)		; RV64I-NEXT: add a0, a0, tp, %tprel_add(tl_8)
; RV64I-NEXT: ld a0, %tprel_lo(tl_8)(a0)		; RV64I-NEXT: ld a0, %tprel_lo(tl_8)(a0)
; RV64I-NEXT: ret		; RV64I-NEXT: ret
entry:		entry:
%0 = load i64, i64* @tl_8		%0 = load i64, i64* @tl_8
ret i64 %0		ret i64 %0
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Fold ADDIs into load/stores with nonzero offsets
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 263077

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Fold ADDIs into load/stores with nonzero offsetsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 263077

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll

[RISCV] Fold ADDIs into load/stores with nonzero offsets
ClosedPublic