Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck
asb
kito-cheng

Commits

rG18fda867f477: [RISCV] Optimize scalable frame offset calculation when VLEN is precisely known

Summary

When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Nov 7 2022, 2:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 2:37 PM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 29 others. · View Herald Transcript

reames requested review of this revision.Nov 7 2022, 2:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 2:37 PM

Herald added subscribers: alextsao1999, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

kito-cheng added inline comments.Nov 7 2022, 2:46 PM

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll
7	Should be SPILL-O2-VLEN256?
90	I thought there should not appear any read vlenb here? since according the description I expect we could just use 256 here?

kito-cheng added inline comments.Nov 7 2022, 2:51 PM

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll
6	And this also an old problem here, should we emit a warring or error when zvlb info is mismatch with `-riscv-v-vector-bits-min=`? min vlen is 128 according the `zvl128b` (which implied by `v`), but we force that become 256 by `-riscv-v-vector-bits-min=`, that became an issue when we use `Tag_RISCV_arch` to check the expected minimal vlen requirement for the object? There is still controversial about how to interpret `Tag_RISCV_arch`, but eventually will resolved that by some way.

reames added a child revision: D137593: [RISCV] Optimize scalable frame setup when VLEN is precisely known.Nov 7 2022, 2:57 PM

kito-cheng added inline comments.Nov 7 2022, 3:00 PM

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll
90	my bad, it's separated patch, I saw https://reviews.llvm.org/D137593 now

reames added inline comments.Nov 7 2022, 3:30 PM

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll
6	There's no conflict here. Zvl128b is a minimum, not a maximum. As such, specifying a larger VLEN via -riscv-v-vector-bits-min does not conflict. There's a separate question of whether the -riscv-v-vector-bits-min should effect the attributes, but that's well out of scope for this patch.
7	This is a typo, and I'll fix.

Harbormaster completed remote builds in B196576: Diff 473797.Nov 7 2022, 3:42 PM

Fix test line per reviewer comment

Harbormaster completed remote builds in B196619: Diff 473853.Nov 7 2022, 7:41 PM

This looks good to me.

This revision is now accepted and ready to land.Nov 8 2022, 3:15 AM

reames added inline comments.Nov 8 2022, 7:48 AM

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll
98	The offset here is wrong. I missed a factor of 8 divide in my code which exists in the original getVLENFactoredAmount path. It's not really clear to me where that factor comes from, but what's here is definitely wrong in this test. Updated version forthcoming.

Fix bug around missing divide by 8.

I have to admit I don't know why that divide is needed. It parallels the code in getVLENFactoredAmount, but it's not clear to me where the factor of 8 comes from.

reames requested review of this revision.Nov 10 2022, 8:40 AM

In D137591#3919658, @reames wrote:

Fix bug around missing divide by 8.

I have to admit I don't know why that divide is needed. It parallels the code in getVLENFactoredAmount, but it's not clear to me where the factor of 8 comes from.

I think all scalable vector stack objects are rounded up to be a multiple of 8 by this code in RISCVFrameLowering::assignRVVStackObjectOffsets

// If the data type is the fractional vector type, reserve one vector        
// register for it.                                                          
if (ObjectSize < 8)                                                          
  ObjectSize = 8;

I think this is done because there is no fractional whole register load/store. The divide by 8 is trying to figure out how many whole registers there are.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
185	I'm sure you copy and pasted this from somewhere else, but this message isn't great. It should be something like "Scalable offset is not a multiple of a single vector size."

Harbormaster completed remote builds in B197089: Diff 474557.Nov 10 2022, 9:34 AM

Address reviewer comment on assert message

In D137591#3919786, @craig.topper wrote:
In D137591#3919658, @reames wrote:

Fix bug around missing divide by 8.

I have to admit I don't know why that divide is needed. It parallels the code in getVLENFactoredAmount, but it's not clear to me where the factor of 8 comes from.

I think all scalable vector stack objects are rounded up to be a multiple of 8 by this code in RISCVFrameLowering::assignRVVStackObjectOffsets
// If the data type is the fractional vector type, reserve one vector        
// register for it.                                                          
if (ObjectSize < 8)                                                          
  ObjectSize = 8;
I think this is done because there is no fractional whole register load/store. The divide by 8 is trying to figure out how many whole registers there are.

The bit I don't understand is the interpretation of object size for a scalable value. Why is "8" the right number for a single VLEN sized register?

Harbormaster completed remote builds in B197095: Diff 474567.Nov 10 2022, 10:38 AM

ping

LGTM

This revision is now accepted and ready to land.Nov 18 2022, 9:13 AM

This revision was landed with ongoing or failed builds.Nov 18 2022, 9:57 AM

Closed by commit rG18fda867f477: [RISCV] Optimize scalable frame offset calculation when VLEN is precisely known (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG18fda867f477: [RISCV] Optimize scalable frame offset calculation when VLEN is precisely known.

Diff 476516

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp

	Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	bool RISCVRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,			bool RISCVRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
	int SPAdj, unsigned FIOperandNum,			int SPAdj, unsigned FIOperandNum,
	RegScavenger *RS) const {			RegScavenger *RS) const {
	assert(SPAdj == 0 && "Unexpected non-zero SPAdj value");			assert(SPAdj == 0 && "Unexpected non-zero SPAdj value");

	MachineInstr &MI = *II;			MachineInstr &MI = *II;
	MachineFunction &MF = *MI.getParent()->getParent();			MachineFunction &MF = *MI.getParent()->getParent();
	MachineRegisterInfo &MRI = MF.getRegInfo();			MachineRegisterInfo &MRI = MF.getRegInfo();
	const RISCVInstrInfo *TII = MF.getSubtarget<RISCVSubtarget>().getInstrInfo();			const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
				const RISCVInstrInfo *TII = ST.getInstrInfo();
	DebugLoc DL = MI.getDebugLoc();			DebugLoc DL = MI.getDebugLoc();

	int FrameIndex = MI.getOperand(FIOperandNum).getIndex();			int FrameIndex = MI.getOperand(FIOperandNum).getIndex();
	Register FrameReg;			Register FrameReg;
	StackOffset Offset =			StackOffset Offset =
	getFrameLowering(MF)->getFrameIndexReference(MF, FrameIndex, FrameReg);			getFrameLowering(MF)->getFrameIndexReference(MF, FrameIndex, FrameReg);
	bool IsRVVSpill = RISCV::isRVVSpill(MI);			bool IsRVVSpill = RISCV::isRVVSpill(MI);
	if (!IsRVVSpill)			if (!IsRVVSpill)
	Offset += StackOffset::getFixed(MI.getOperand(FIOperandNum + 1).getImm());			Offset += StackOffset::getFixed(MI.getOperand(FIOperandNum + 1).getImm());

				if (Offset.getScalable() &&
				ST.getRealMinVLen() == ST.getRealMaxVLen()) {
				// For an exact VLEN value, scalable offsets become constant and thus
				// can be converted entirely into fixed offsets.
				int64_t FixedValue = Offset.getFixed();
				int64_t ScalableValue = Offset.getScalable();
				assert(ScalableValue % 8 == 0 &&
				"Scalable offset is not a multiple of a single vector size.");
				craig.topperUnsubmitted Not Done Reply Inline Actions I'm sure you copy and pasted this from somewhere else, but this message isn't great. It should be something like "Scalable offset is not a multiple of a single vector size." craig.topper: I'm sure you copy and pasted this from somewhere else, but this message isn't great. It should…
				int64_t NumOfVReg = ScalableValue / 8;
				int64_t VLENB = ST.getRealMinVLen() / 8;
				Offset = StackOffset::getFixed(FixedValue + NumOfVReg * VLENB);
				}

	if (!isInt<32>(Offset.getFixed())) {			if (!isInt<32>(Offset.getFixed())) {
	report_fatal_error(			report_fatal_error(
	"Frame offsets outside of the signed 32-bit range not supported");			"Frame offsets outside of the signed 32-bit range not supported");
	}			}

	MachineBasicBlock &MBB = *MI.getParent();			MachineBasicBlock &MBB = *MI.getParent();
	bool FrameRegIsKill = false;			bool FrameRegIsKill = false;

	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O0 < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O0 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O0 %s			; RUN: \| FileCheck --check-prefix=SPILL-O0 %s
	; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O2 < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O2 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O2 %s			; RUN: \| FileCheck --check-prefix=SPILL-O2 %s
				; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -riscv-v-vector-bits-max=128 -O2 < %s \
				kito-chengUnsubmitted Not Done Reply Inline Actions And this also an old problem here, should we emit a warring or error when zvlb info is mismatch with `-riscv-v-vector-bits-min=`? min vlen is 128 according the `zvl128b` (which implied by `v`), but we force that become 256 by `-riscv-v-vector-bits-min=`, that became an issue when we use `Tag_RISCV_arch` to check the expected minimal vlen requirement for the object? There is still controversial about how to interpret `Tag_RISCV_arch`, but eventually will resolved that by some way. kito-cheng: And this also an old problem here, should we emit a warring or error when zvl*b info is…
				reamesAuthorUnsubmitted Done Reply Inline Actions There's no conflict here. Zvl128b is a minimum, not a maximum. As such, specifying a larger VLEN via -riscv-v-vector-bits-min does not conflict. There's a separate question of whether the -riscv-v-vector-bits-min should effect the attributes, but that's well out of scope for this patch. reames: There's no conflict here. Zvl128b is a minimum, not a maximum. As such, specifying a larger…
				; RUN: \| FileCheck --check-prefix=SPILL-O2-VLEN128 %s
				kito-chengUnsubmitted Not Done Reply Inline Actions Should be SPILL-O2-VLEN256? kito-cheng: Should be SPILL-O2-VLEN256?
				reamesAuthorUnsubmitted Done Reply Inline Actions This is a typo, and I'll fix. reames: This is a typo, and I'll fix.


	@.str = private unnamed_addr constant [6 x i8] c"hello\00", align 1			@.str = private unnamed_addr constant [6 x i8] c"hello\00", align 1

	define <vscale x 1 x double> @foo(<vscale x 1 x double> %a, <vscale x 1 x double> %b, <vscale x 1 x double> %c, i64 %gvl) nounwind			define <vscale x 1 x double> @foo(<vscale x 1 x double> %a, <vscale x 1 x double> %b, <vscale x 1 x double> %c, i64 %gvl) nounwind
	; SPILL-O0-LABEL: foo:			; SPILL-O0-LABEL: foo:
	; SPILL-O0: # %bb.0:			; SPILL-O0: # %bb.0:
	; SPILL-O0-NEXT: addi sp, sp, -48			; SPILL-O0-NEXT: addi sp, sp, -48
	; SPILL-O0-NEXT: sd ra, 40(sp) # 8-byte Folded Spill			; SPILL-O0-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; SPILL-O2-NEXT: vfadd.vv v8, v9, v8			; SPILL-O2-NEXT: vfadd.vv v8, v9, v8
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
	; SPILL-O2-NEXT: slli a0, a0, 1			; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: add sp, sp, a0			; SPILL-O2-NEXT: add sp, sp, a0
	; SPILL-O2-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; SPILL-O2-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
	; SPILL-O2-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; SPILL-O2-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
	; SPILL-O2-NEXT: addi sp, sp, 32			; SPILL-O2-NEXT: addi sp, sp, 32
	; SPILL-O2-NEXT: ret			; SPILL-O2-NEXT: ret
				;
				; SPILL-O2-VLEN128-LABEL: foo:
				; SPILL-O2-VLEN128: # %bb.0:
				; SPILL-O2-VLEN128-NEXT: addi sp, sp, -32
				; SPILL-O2-VLEN128-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
				; SPILL-O2-VLEN128-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
				; SPILL-O2-VLEN128-NEXT: csrr a1, vlenb
				kito-chengUnsubmitted Not Done Reply Inline Actions I thought there should not appear any read vlenb here? since according the description I expect we could just use 256 here? kito-cheng: I thought there should not appear any read vlenb here? since according the description I expect…
				kito-chengUnsubmitted Not Done Reply Inline Actions my bad, it's separated patch, I saw https://reviews.llvm.org/D137593 now kito-cheng: my bad, it's separated patch, I saw https://reviews.llvm.org/D137593 now
				; SPILL-O2-VLEN128-NEXT: slli a1, a1, 1
				; SPILL-O2-VLEN128-NEXT: sub sp, sp, a1
				; SPILL-O2-VLEN128-NEXT: mv s0, a0
				; SPILL-O2-VLEN128-NEXT: addi a1, sp, 16
				; SPILL-O2-VLEN128-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
				; SPILL-O2-VLEN128-NEXT: vsetvli zero, a0, e64, m1, ta, ma
				; SPILL-O2-VLEN128-NEXT: vfadd.vv v9, v8, v9
				; SPILL-O2-VLEN128-NEXT: addi a0, sp, 32
				reamesAuthorUnsubmitted Done Reply Inline Actions The offset here is wrong. I missed a factor of 8 divide in my code which exists in the original getVLENFactoredAmount path. It's not really clear to me where that factor comes from, but what's here is definitely wrong in this test. Updated version forthcoming. reames: The offset here is wrong. I missed a factor of 8 divide in my code which exists in the…
				; SPILL-O2-VLEN128-NEXT: vs1r.v v9, (a0) # Unknown-size Folded Spill
				; SPILL-O2-VLEN128-NEXT: lui a0, %hi(.L.str)
				; SPILL-O2-VLEN128-NEXT: addi a0, a0, %lo(.L.str)
				; SPILL-O2-VLEN128-NEXT: call puts@plt
				; SPILL-O2-VLEN128-NEXT: vsetvli zero, s0, e64, m1, ta, ma
				; SPILL-O2-VLEN128-NEXT: addi a0, sp, 32
				; SPILL-O2-VLEN128-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
				; SPILL-O2-VLEN128-NEXT: addi a0, sp, 16
				; SPILL-O2-VLEN128-NEXT: vl1r.v v9, (a0) # Unknown-size Folded Reload
				; SPILL-O2-VLEN128-NEXT: vfadd.vv v8, v9, v8
				; SPILL-O2-VLEN128-NEXT: csrr a0, vlenb
				; SPILL-O2-VLEN128-NEXT: slli a0, a0, 1
				; SPILL-O2-VLEN128-NEXT: add sp, sp, a0
				; SPILL-O2-VLEN128-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
				; SPILL-O2-VLEN128-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
				; SPILL-O2-VLEN128-NEXT: addi sp, sp, 32
				; SPILL-O2-VLEN128-NEXT: ret
	{			{
	%x = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i64 %gvl)			%x = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i64 %gvl)
	%call = call signext i32 @puts(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))			%call = call signext i32 @puts(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))
	%z = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %x, i64 %gvl)			%z = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %x, i64 %gvl)
	ret <vscale x 1 x double> %z			ret <vscale x 1 x double> %z
	}			}

	declare <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> %passthru, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i64 %gvl)			declare <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> %passthru, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i64 %gvl)
	declare i32 @puts(i8*);			declare i32 @puts(i8*);

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize scalable frame offset calculation when VLEN is precisely known
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476516

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize scalable frame offset calculation when VLEN is precisely knownClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476516

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll

[RISCV] Optimize scalable frame offset calculation when VLEN is precisely known
ClosedPublic