This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVInstrInfo.cpp
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
large-rvv-stack-size.mir

Differential D104727

[RISCV] Permit larger RVV stacks and stack offsets
ClosedPublic

Authored by frasercrmck on Jun 22 2021, 10:58 AM.

Download Raw Diff

Details

Reviewers

craig.topper
HsiangKai
rogfer01
evandro
khchen
arcbbb

Commits

rGab1bd255939e: [RISCV] Permit larger RVV stacks and stack offsets

Summary

This patch teaches the compiler to generate code to handle larger RVV
stack sizes and stack offsets which resolve an amount larger than 2047
vector registers in size.

The previous behaviour was asserting on such large values as it was only
able to materialize the constant by feeding it to the 12-bit immediate
of an ADDI instruction. The compiler can now materialize this amount
into a temporary register before continuing with the computation.

A test case for this scenario is included which also checks that the
temporary register used to materialize the amount doesn't require an
additional spill slot over what we're already reserving for RVV code.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.Jun 22 2021, 10:58 AM

Herald added subscribers: vkmr, luismarques, apazos and 22 others. · View Herald TranscriptJun 22 2021, 10:58 AM

frasercrmck requested review of this revision.Jun 22 2021, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2021, 10:58 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Strictly speaking it's not 12-bit stack offsets we're now supporting but rather a 12-bit "number of vector registers". For whatever reason I was unable to find a concise way of explaining this in the commit message/description so if anyone's got a better way I'm all ears.

"Support more than 2047 vector register stack slots"?

Although this isn't even that, this is "Support spilling more than 2047 vector elements"?

Harbormaster completed remote builds in B110450: Diff 353714.Jun 22 2021, 12:00 PM

In D104727#2833872, @jrtc27 wrote:

Although this isn't even that, this is "Support spilling more than 2047 vector elements"?

Yeah, sort of? It is a fairly generic function used both to set up the stack by decrementing the stack pointer by the RVV stack size and also to fix up frame indices, so it's not strictly equal to spilling this or that number of registers. Stack slots is perhaps a little better, actually, but I could still think of some pathological case where I have relatively few vectors on the stack but with really high alignments that effectively push up the stack size and thus the offsets. Maybe I should just say that we can now support larger "VLEN-factored amounts"? That's sort of meaningless to the casual reader though. Oh what a can of worms.

Strictly speaking it's not 12-bit stack offsets we're now supporting but rather a 12-bit "number of vector registers". For whatever reason I was unable to find a concise way of explaining this in the commit message/description so if anyone's got a better way I'm all ears.

I think "number of vector registers" may be good enough. This seems difficult to describe because the offsets in the scalable vector stackid have their offsets conceptually scaled to the smallest vscale possible (=1), so the code counts how many registers (currently 8 bytes per VR under vscale=1) are contained in (the unscaled) Amount to later scale that using vlenb.

I'm curious when you hit this. Asking because, unless I'm reading the testcase wrong, you're computing a NumOfVReg that seems huge (3072?)

Other than that LGTM. Thanks @frasercrmck

This revision is now accepted and ready to land.Jun 24 2021, 2:28 AM

In D104727#2838057, @rogfer01 wrote:

Strictly speaking it's not 12-bit stack offsets we're now supporting but rather a 12-bit "number of vector registers". For whatever reason I was unable to find a concise way of explaining this in the commit message/description so if anyone's got a better way I'm all ears.

I think "number of vector registers" may be good enough. This seems difficult to describe because the offsets in the scalable vector stackid have their offsets conceptually scaled to the smallest vscale possible (=1), so the code counts how many registers (currently 8 bytes per VR under vscale=1) are contained in (the unscaled) Amount to later scale that using vlenb.

I'm curious when you hit this. Asking because, unless I'm reading the testcase wrong, you're computing a NumOfVReg that seems huge (3072?)

Other than that LGTM. Thanks @frasercrmck

Yeah I suppose the code says it best: it's the number of VLEN-sized registers. I have no idea why I find this part of the code so unintuitive. Probably for the reason you describe.

As for how we reached this situation, I was really trying to "torture test" the code generator by really pushing the vectorization factor in our OpenCL implementation. Basically it ended up force-vectorizing kernels containing things like <16 x i64> up by vscale x 16 and so was ending up with nonsensically wide vectors like <vscale x 256 x i64> which must of course spill. I'm not claiming that this is the sole reason as I've seen some particularly bad code generation even at the lower VFs (anything larger than vscale x 1 can end up spilling even for small kernels). But yeah if there's ever a reason to spill a silly amount of registers, this'd be it. The good news is that this was the only failure I saw across the test suite!

The test case itself is just a pseudo-random number I chose that's higher than 2048 but isn't near a power of two (meaning it'd choose to shift) I haven't been through all the "real" failures: the only one I looked at was somewhere in the 2000s.

rebase

frasercrmck retitled this revision from [RISCV] Permit RVV stack offsets larger than 12 bits to [RISCV] Permit larger RVV stacks and stack offsets.Jun 24 2021, 8:53 AM

frasercrmck edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B110841: Diff 354267.Jun 24 2021, 9:34 AM

remove explicit use of TII->

Harbormaster completed remote builds in B110950: Diff 354431.Jun 24 2021, 11:23 PM

Closed by commit rGab1bd255939e: [RISCV] Permit larger RVV stacks and stack offsets (authored by frasercrmck). · Explain WhyJun 24 2021, 11:26 PM

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rGab1bd255939e: [RISCV] Permit larger RVV stacks and stack offsets.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInstrInfo.cpp

13 lines

test/

CodeGen/

RISCV/

rvv/

large-rvv-stack-size.mir

92 lines

Diff 354438

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

Show First 20 Lines • Show All 1,467 Lines • ▼ Show 20 Lines	assert(Amount % 8 == 0 &&
"Reserve the stack by the multiple of one vector size.");		"Reserve the stack by the multiple of one vector size.");

MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
const RISCVInstrInfo *TII = MF.getSubtarget<RISCVSubtarget>().getInstrInfo();		const RISCVInstrInfo *TII = MF.getSubtarget<RISCVSubtarget>().getInstrInfo();
int64_t NumOfVReg = Amount / 8;		int64_t NumOfVReg = Amount / 8;

Register VL = MRI.createVirtualRegister(&RISCV::GPRRegClass);		Register VL = MRI.createVirtualRegister(&RISCV::GPRRegClass);
BuildMI(MBB, II, DL, TII->get(RISCV::PseudoReadVLENB), VL);		BuildMI(MBB, II, DL, TII->get(RISCV::PseudoReadVLENB), VL);
assert(isInt<12>(NumOfVReg) &&		assert(isInt<32>(NumOfVReg) &&
"Expect the number of vector registers within 12-bits.");		"Expect the number of vector registers within 32-bits.");
if (isPowerOf2_32(NumOfVReg)) {		if (isPowerOf2_32(NumOfVReg)) {
uint32_t ShiftAmount = Log2_32(NumOfVReg);		uint32_t ShiftAmount = Log2_32(NumOfVReg);
if (ShiftAmount == 0)		if (ShiftAmount == 0)
return VL;		return VL;
BuildMI(MBB, II, DL, TII->get(RISCV::SLLI), VL)		BuildMI(MBB, II, DL, TII->get(RISCV::SLLI), VL)
.addReg(VL, RegState::Kill)		.addReg(VL, RegState::Kill)
.addImm(ShiftAmount);		.addImm(ShiftAmount);
} else if (isPowerOf2_32(NumOfVReg - 1)) {		} else if (isPowerOf2_32(NumOfVReg - 1)) {
Show All 11 Lines	if (isPowerOf2_32(NumOfVReg)) {
BuildMI(MBB, II, DL, TII->get(RISCV::SLLI), ScaledRegister)		BuildMI(MBB, II, DL, TII->get(RISCV::SLLI), ScaledRegister)
.addReg(VL)		.addReg(VL)
.addImm(ShiftAmount);		.addImm(ShiftAmount);
BuildMI(MBB, II, DL, TII->get(RISCV::SUB), VL)		BuildMI(MBB, II, DL, TII->get(RISCV::SUB), VL)
.addReg(ScaledRegister, RegState::Kill)		.addReg(ScaledRegister, RegState::Kill)
.addReg(VL, RegState::Kill);		.addReg(VL, RegState::Kill);
} else {		} else {
Register N = MRI.createVirtualRegister(&RISCV::GPRRegClass);		Register N = MRI.createVirtualRegister(&RISCV::GPRRegClass);
		if (!isInt<12>(NumOfVReg))
		movImm(MBB, II, DL, N, NumOfVReg);
		else
BuildMI(MBB, II, DL, TII->get(RISCV::ADDI), N)		BuildMI(MBB, II, DL, TII->get(RISCV::ADDI), N)
.addReg(RISCV::X0)		.addReg(RISCV::X0)
.addImm(NumOfVReg);		.addImm(NumOfVReg);
if (!MF.getSubtarget<RISCVSubtarget>().hasStdExtM())		if (!MF.getSubtarget<RISCVSubtarget>().hasStdExtM())
MF.getFunction().getContext().diagnose(DiagnosticInfoUnsupported{		MF.getFunction().getContext().diagnose(DiagnosticInfoUnsupported{
MF.getFunction(),		MF.getFunction(),
"M-extension must be enabled to calculate the vscaled size/offset."});		"M-extension must be enabled to calculate the vscaled size/offset."});
BuildMI(MBB, II, DL, TII->get(RISCV::MUL), VL)		BuildMI(MBB, II, DL, TII->get(RISCV::MUL), VL)
.addReg(VL, RegState::Kill)		.addReg(VL, RegState::Kill)
.addReg(N, RegState::Kill);		.addReg(N, RegState::Kill);
}		}
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/large-rvv-stack-size.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				# RUN: llc -mtriple riscv64 -mattr=+m,+experimental-v -start-before=prologepilog -o - \
				# RUN: -verify-machineinstrs %s \| FileCheck %s
				--- \|
				target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
				target triple = "riscv64"

				define void @spillslot() {
				; CHECK-LABEL: spillslot:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addi sp, sp, -2032
				; CHECK-NEXT: .cfi_def_cfa_offset 2032
				; CHECK-NEXT: sd ra, 2024(sp) # 8-byte Folded Spill
				; CHECK-NEXT: sd s0, 2016(sp) # 8-byte Folded Spill
				; CHECK-NEXT: .cfi_offset ra, -8
				; CHECK-NEXT: .cfi_offset s0, -16
				; CHECK-NEXT: addi s0, sp, 2032
				; CHECK-NEXT: .cfi_def_cfa s0, 0
				; CHECK-NEXT: addi sp, sp, -272
				; CHECK-NEXT: sd a0, 8(sp)
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: sd a1, 0(sp)
				; CHECK-NEXT: lui a1, 1
				; CHECK-NEXT: addiw a1, a1, -1024
				; CHECK-NEXT: mul a0, a0, a1
				; CHECK-NEXT: ld a1, 0(sp)
				; CHECK-NEXT: sub sp, sp, a0
				; CHECK-NEXT: andi sp, sp, -128
				; CHECK-NEXT: lui a0, 1
				; CHECK-NEXT: addiw a0, a0, -1808
				; CHECK-NEXT: add a0, sp, a0
				; CHECK-NEXT: vs1r.v v25, (a0) # Unknown-size Folded Spill
				; CHECK-NEXT: ld a0, 8(sp)
				; CHECK-NEXT: call spillslot@plt
				; CHECK-NEXT: lui a0, 1
				; CHECK-NEXT: addiw a0, a0, -1792
				; CHECK-NEXT: sub sp, s0, a0
				; CHECK-NEXT: addi sp, sp, 272
				; CHECK-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
				; CHECK-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
				; CHECK-NEXT: addi sp, sp, 2032
				; CHECK-NEXT: ret
				ret void
				}

				...
				---
				name: spillslot
				alignment: 4
				tracksRegLiveness: false
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 128
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 4294967295
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				hasTailCall: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: default, offset: 0, size: 2048, alignment: 128,
				stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: 0, size: 24576, alignment: 8,
				stack-id: scalable-vector, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				body: \|
				bb.0:
				liveins: $x1, $x5, $x6, $x7, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31, $v25

				PseudoVSPILL_M1 killed renamable $v25, %stack.1 :: (store unknown-size into %stack.1, align 8)
				; This is here just to make all the eligible registers live at this point.
				; This way when we replace the frame index %stack.1 with its actual address
				; we have to allocate two virtual registers to compute it.
				; A later run of the the register scavenger won't find available registers
				; either so it will have to spill two to the emergency spill slots
				; required for this RVV computation.
				PseudoCALL target-flags(riscv-plt) @spillslot, csr_ilp32_lp64, implicit-def $x1, implicit-def $x2, implicit $x1, implicit $x5, implicit $x6, implicit $x7, implicit $x10, implicit $x11, implicit $x12, implicit $x13, implicit $x14, implicit $x15, implicit $x16, implicit $x17, implicit $x28, implicit $x29, implicit $x30, implicit $x31
				PseudoRET
				...