This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/CodeGen/RISCV/rvv/
-
test/
-
CodeGen/
-
RISCV/
-
rvv/
-
wrong-stack-slot-rv32.mir

Differential D125962

[RISCV] Add a test showing overlapping stack offsets with RVV
ClosedPublic

Authored by frasercrmck on May 19 2022, 3:32 AM.

Download Raw Diff

Details

Reviewers

rogfer01
HsiangKai
reames
kito-cheng
craig.topper
StephenFan

Commits

rGa351070710f5: [RISCV] Add a test showing overlapping stack offsets with RVV

Summary

This test (and its forthcoming fix) was split off from D125787. It shows
that the logic we use to determine when we need to add extra RVV padding
is insufficient.

In this example, we may have a situation involving dynamic stack
alignment -- but no variable-sized objects -- where we have no FP but
must still use SP to index objects. In this case we also need the
extra RVV padding, otherwise objects may overlap. Specifically, the test
shows that the RVV vector object may clobber the lowest callee-save.

|------------------------------| -- <-- Incoming SP
| 4-byte callee-save (ra)      |
|------------------------------| -- <-- SP + VLENB*2 + 60
| 4-byte callee-save (s0)      |
|------------------------------| -- <-- SP + VLENB*2 + 56  --
| 4-byte callee-save (s9)      |                            |
|------------------------------| -- <-- SP + VLENB*2 + 52   | RVV object(!!)
| VLENB*2 RVV object           |                            |
|------------------------------| -- <-- SP + 56            --
| 4-byte local object          | 
|------------------------------| -- <-- SP + 32
| Dead area                    |
|------------------------------| -- <-- InSP - 2*VLENB - 64
| Possibly-zero realignment    | 
|------------------------------| -- <-- SP (realigned to 32)

This diagram should help show that when SP==InSP -- e.g., when the incoming SP
is 32-byte aligned, subtracting 2*VLENB+64 may keep it that way -- the RVV
object clobbers the spill of s9.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.May 19 2022, 3:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 3:32 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 25 others. · View Herald Transcript

frasercrmck requested review of this revision.May 19 2022, 3:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 3:32 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

frasercrmck mentioned this in D125964: [RISCV] Fix logic for determining RVV stack padding.May 19 2022, 3:50 AM

frasercrmck added a child revision: D125964: [RISCV] Fix logic for determining RVV stack padding.May 19 2022, 3:51 AM

frasercrmck mentioned this in D125787: [RISCV] Fix RVV stack frame alignment bugs.May 19 2022, 3:56 AM

Harbormaster completed remote builds in B165293: Diff 430626.May 19 2022, 4:11 AM

LGTM

This revision is now accepted and ready to land.May 19 2022, 8:27 AM

This revision was landed with ongoing or failed builds.May 20 2022, 5:11 AM

Closed by commit rGa351070710f5: [RISCV] Add a test showing overlapping stack offsets with RVV (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rGa351070710f5: [RISCV] Add a test showing overlapping stack offsets with RVV.

frasercrmck mentioned this in rGd60ae47f9dab: [RISCV] Fix logic for determining RVV stack padding.May 20 2022, 5:31 AM

Hi @frasercrmck ,

May I ask why does the generated asm stores is trying to store a0 (x10)?
The RISC-V calling convention specifies that a0 is a caller saved register.

Regards,

eop Chen

In D125962#3533712, @eopXD wrote:

Hi @frasercrmck ,

May I ask why does the generated asm stores is trying to store a0 (x10)?
The RISC-V calling convention specifies that a0 is a caller saved register.

Regards,

eop Chen

Hi. The test is arbitrary and is storing s9/$x25 just to have something storing to a scalar stack slot to exercise the frame lowering; it's not a real callee spill.

Since $x25 is a copy of a0/$x10, eventually the machine copy propagation pass removes the copy and that's why it becomes a store of a0.

We could update the tests to use a callee-saved register for readability, but this isn't a correctness issue.

Hi. The test is arbitrary and is storing s9/$x25 just to have something storing to a scalar stack slot to exercise the frame lowering; it's not a real callee spill.
Since $x25 is a copy of a0/$x10, eventually the machine copy propagation pass removes the copy and that's why it becomes a store of a0.
We could update the tests to use a callee-saved register for readability, but this isn't a correctness issue.

Thank you for the reply. I think I am getting closer to understanding the whole thing.

So after the machine copy propagation, the code looks like the following.
s9/$x25 contains value (so we need to callee-save it) and a0/$x10 is a spill-slot object (so we need to callee save it too along with $v30) as well, right?

$x25 = COPY $x10
SW renamable $x10, %stack.0, 0 :: (store (s32) into %stack.0)
PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
PseudoRET

In D125962#3535942, @eopXD wrote:
So after the machine copy propagation, the code looks like the following.
s9/$x25 contains value (so we need to callee-save it) and a0/$x10 is a spill-slot object (so we need to callee save it too along with $v30) as well, right?
$x25 = COPY $x10
SW renamable $x10, %stack.0, 0 :: (store (s32) into %stack.0)
PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
PseudoRET

Yes, we callee-save s9/$x25 because we're clobbering it in the test. That spill (sw s9, 52(sp)) is automatically generated by the frame lowering code.

But no, I wouldn't say we're "callee-saving" a0/$x10 or $v30. Personally I think of the term "callee-saving" as describing the process of saving registers for the purposes of the calling convention. But in this test we're only storing a0/$x10 and $v30 to the stack because we've concocted a test with those stores in it. We're just demonstrating some codegen in an arbitrary fashion. The stores/spills are unnecessary from a calling convention point of view, but LLVM isn't going to (or able to) remove such unnecessary stack spills - at least at this stage of the pipeline.

I think maybe the type: spill-slot is possibly leading to some confusion? Sorry about that - I just copy/pasted from other tests. I think type: default would be more accurate.

I hope that helps and I'm not confusing you further :)

In D125962#3549448, @frasercrmck wrote:
In D125962#3535942, @eopXD wrote:
So after the machine copy propagation, the code looks like the following.
s9/$x25 contains value (so we need to callee-save it) and a0/$x10 is a spill-slot object (so we need to callee save it too along with $v30) as well, right?
$x25 = COPY $x10
SW renamable $x10, %stack.0, 0 :: (store (s32) into %stack.0)
PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
PseudoRET
Yes, we callee-save s9/$x25 because we're clobbering it in the test. That spill (sw s9, 52(sp)) is automatically generated by the frame lowering code.

But no, I wouldn't say we're "callee-saving" a0/$x10 or $v30. Personally I think of the term "callee-saving" as describing the process of saving registers for the purposes of the calling convention. But in this test we're only storing a0/$x10 and $v30 to the stack because we've concocted a test with those stores in it. We're just demonstrating some codegen in an arbitrary fashion. The stores/spills are unnecessary from a calling convention point of view, but LLVM isn't going to (or able to) remove such unnecessary stack spills - at least at this stage of the pipeline.

I think maybe the type: spill-slot is possibly leading to some confusion? Sorry about that - I just copy/pasted from other tests. I think type: default would be more accurate.

I hope that helps and I'm not confusing you further :)

I see. Thank you very much for the explanation!

Revision Contents

Path

Size

llvm/

test/

CodeGen/

RISCV/

rvv/

wrong-stack-slot-rv32.mir

49 lines

Diff 430944

llvm/test/CodeGen/RISCV/rvv/wrong-stack-slot-rv32.mir

	# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	# RUN: llc -mtriple=riscv32 -mattr=+m,+v -o - %s \			# RUN: llc -mtriple=riscv32 -mattr=+m,+v -o - %s \
	# RUN: -start-before=prologepilog \| FileCheck %s			# RUN: -start-before=prologepilog \| FileCheck %s
	#			#
	# This test checks that we are assigning the right stack slot to GPRs and to			# These tests check that we are assigning the right stack slot to GPRs and to
	# vector registers (VRs). If this test changes, make sure there is no overlap			# vector registers (VRs). If this test changes, make sure there is no overlap
	# between slots for GPRs and VRs.			# between slots for GPRs and VRs.
	--- \|			--- \|
	define void @foo() #0 {			define void @foo() #0 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -32
	; CHECK-NEXT: sw s9, 28(sp) # 4-byte Folded Spill			; CHECK-NEXT: sw s9, 28(sp) # 4-byte Folded Spill
	; CHECK-NEXT: csrr a1, vlenb			; CHECK-NEXT: csrr a1, vlenb
	; CHECK-NEXT: slli a1, a1, 1			; CHECK-NEXT: slli a1, a1, 1
	; CHECK-NEXT: sub sp, sp, a1			; CHECK-NEXT: sub sp, sp, a1
	; CHECK-NEXT: sw a0, 8(sp) # 4-byte Folded Spill			; CHECK-NEXT: sw a0, 8(sp) # 4-byte Folded Spill
	; CHECK-NEXT: addi a0, sp, 16			; CHECK-NEXT: addi a0, sp, 16
	; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill			; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a0, a0, 1			; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: add sp, sp, a0			; CHECK-NEXT: add sp, sp, a0
	; CHECK-NEXT: lw s9, 28(sp) # 4-byte Folded Reload			; CHECK-NEXT: lw s9, 28(sp) # 4-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	ret void			ret void
	}			}

				; FIXME: If the stack realignment does nothing to sp (a possibility) then
				; the vlenb*2-sized RVV stack object at sp+56 overlaps with the slot
				; allocated to spilling s9 (sp+52+vlenb*2)
				define void @rvv_clobbers_callee_save() #0 {
				; CHECK-LABEL: rvv_clobbers_callee_save:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: addi sp, sp, -64
				; CHECK-NEXT: sw ra, 60(sp) # 4-byte Folded Spill
				; CHECK-NEXT: sw s0, 56(sp) # 4-byte Folded Spill
				; CHECK-NEXT: sw s9, 52(sp) # 4-byte Folded Spill
				; CHECK-NEXT: addi s0, sp, 64
				; CHECK-NEXT: csrr a1, vlenb
				; CHECK-NEXT: slli a1, a1, 1
				; CHECK-NEXT: sub sp, sp, a1
				; CHECK-NEXT: andi sp, sp, -32
				; CHECK-NEXT: sw a0, 32(sp) # 4-byte Folded Spill
				; CHECK-NEXT: addi a0, sp, 56
				; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill
				; CHECK-NEXT: addi sp, s0, -64
				; CHECK-NEXT: lw ra, 60(sp) # 4-byte Folded Reload
				; CHECK-NEXT: lw s0, 56(sp) # 4-byte Folded Reload
				; CHECK-NEXT: lw s9, 52(sp) # 4-byte Folded Reload
				; CHECK-NEXT: addi sp, sp, 64
				; CHECK-NEXT: ret
				entry:
				ret void
				}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	...			...
	---			---
	name: foo			name: foo
	alignment: 2			alignment: 2
	frameInfo:			frameInfo:
	maxAlignment: 8			maxAlignment: 8
	stack:			stack:
	- { id: 0, type: spill-slot, size: 4, alignment: 4 }			- { id: 0, type: spill-slot, size: 4, alignment: 4 }
	- { id: 1, type: spill-slot, size: 16, alignment: 8, stack-id: scalable-vector }			- { id: 1, type: spill-slot, size: 16, alignment: 8, stack-id: scalable-vector }
	machineFunctionInfo: {}			machineFunctionInfo: {}
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	liveins: $x10, $v30m2			liveins: $x10, $v30m2

	$x25 = COPY $x10			$x25 = COPY $x10
	SW renamable $x25, %stack.0, 0 :: (store (s32) into %stack.0)			SW renamable $x25, %stack.0, 0 :: (store (s32) into %stack.0)
	PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)			PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
	PseudoRET			PseudoRET

	...			...
				---
				name: rvv_clobbers_callee_save
				alignment: 2
				frameInfo:
				maxAlignment: 8
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 32 }
				- { id: 1, type: spill-slot, size: 16, alignment: 8, stack-id: scalable-vector }
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x10, $v30m2

				$x25 = COPY $x10
				SW renamable $x25, %stack.0, 0 :: (store (s32) into %stack.0)
				PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
				PseudoRET

				...