This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/lib/builtins/
-
lib/
-
builtins/
-
CMakeLists.txt
-
riscv/
-
restore.S
-
save.S

Differential D91717

[RISCV][compiler-rt] Add support for save-restore
ClosedPublic

Authored by edward-jones on Nov 18 2020, 8:30 AM.

Download Raw Diff

Details

Reviewers

luismarques
asb
lenary

Commits

rGb136a74efc54: [RISCV][compiler-rt] Add support for save-restore

Summary

This adds the compiler-rt entry points required by the -msave-restore option. The added entry points are riscv_save_X and riscv_restore_X, where X is the number of callee-saved registers to be saved or restored respectively.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

edward-jones created this revision.Nov 18 2020, 8:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 8:30 AM

Herald added subscribers: Restricted Project, frasercrmck, NickHung and 28 others. · View Herald Transcript

edward-jones requested review of this revision.Nov 18 2020, 8:30 AM

edward-jones added a child revision: D91720: [RISCV][compiler-rt] Add __riscv_restore_tailcall_N entry points.Nov 18 2020, 8:36 AM

Harbormaster completed remote builds in B79305: Diff 306122.Nov 18 2020, 9:15 AM

It seems a bit excessive to me to coalesce the entry points into bundles of 4. Do you have any particular benchmarking data or reasoning that supports choosing that threshold?
Also, shouldn't this implementation include CFI directives?

compiler-rt/lib/builtins/riscv/save_restore.h
1 ↗	(On Diff #306122)	The header banner is missing here. Also, compiler-rt seems to use a mix of .h and .inc files. I'm not sure which one is more canonical, but the existing one in the riscv directory is a .inc.
9–19 ↗	(On Diff #306122)	clang-format is complaining here and suggesting that you remove all indentation. You can still indent it. The existing style is to leave the `#` in the first column and indent the rest of the line.

luismarques added reviewers: luismarques, asb, lenary.Nov 20 2020, 4:44 AM

For what it's worth, libgcc seems to group by 2 for RV64 and by 4 for RV32, i.e. it always rounds up to a multiple of 16 bytes as that's an easy way to preserve stack alignment requirements.

In D91717#2407884, @luismarques wrote:

It seems a bit excessive to me to coalesce the entry points into bundles of 4. Do you have any particular benchmarking data or reasoning that supports choosing that threshold?
Also, shouldn't this implementation include CFI directives?

I used bundles of 4 just to follow the behaviour I saw in libgcc, and the grouping of 2 for rv64 seemed a bit too fine-grained. I'm not sure what the original justification for the coalescing into groups of 2/4 was in libgcc.

I'll update to account for other suggested changes and see if I can find any benchmarks which show the tradeoff for the grouping threshold

In D91717#2411334, @edward-jones wrote:

I used bundles of 4 just to follow the behaviour I saw in libgcc, and the grouping of 2 for rv64 seemed a bit too fine-grained. I'm not sure what the original justification for the coalescing into groups of 2/4 was in libgcc.
I'll update to account for other suggested changes and see if I can find any benchmarks which show the tradeoff for the grouping threshold

Thanks. After thinking about it some more I don't see any significant issue with a bundle size of 4.
I also examined the assembly more thoroughly and tested it, and it seemed correct.

In D91717#2411334, @edward-jones wrote:

In D91717#2407884, @luismarques wrote:

It seems a bit excessive to me to coalesce the entry points into bundles of 4. Do you have any particular benchmarking data or reasoning that supports choosing that threshold?
Also, shouldn't this implementation include CFI directives?

I used bundles of 4 just to follow the behaviour I saw in libgcc, and the grouping of 2 for rv64 seemed a bit too fine-grained. I'm not sure what the original justification for the coalescing into groups of 2/4 was in libgcc.

I'll update to account for other suggested changes and see if I can find any benchmarks which show the tradeoff for the grouping threshold

Because the stack alignment is 16 bytes; see my earlier comment.

In D91717#2411334, @edward-jones wrote:

In D91717#2407884, @luismarques wrote:

It seems a bit excessive to me to coalesce the entry points into bundles of 4. Do you have any particular benchmarking data or reasoning that supports choosing that threshold?
Also, shouldn't this implementation include CFI directives?

I used bundles of 4 just to follow the behaviour I saw in libgcc, and the grouping of 2 for rv64 seemed a bit too fine-grained. I'm not sure what the original justification for the coalescing into groups of 2/4 was in libgcc.

I'll update to account for other suggested changes and see if I can find any benchmarks which show the tradeoff for the grouping threshold

You have to store the correct number of registers in order to access the stack above the libcall size, for instance arguments passed in via the stack. If you spill too many by in some cases spilling two, these offset calculations will be wrong and you'll read the incorrect argument.

In D91717#2411369, @jrtc27 wrote:

Because the stack alignment is 16 bytes; see my earlier comment.

Thanks for your earlier comment. I did think a bit about some possible alternative implementations, which could perhaps make some interesting trade-offs while properly preserving the stack alignment, but in the end the approach used in libgcc and this patch is probably the most sensible one.

In D91717#2411369, @jrtc27 wrote:

In D91717#2411334, @edward-jones wrote:

In D91717#2407884, @luismarques wrote:

It seems a bit excessive to me to coalesce the entry points into bundles of 4. Do you have any particular benchmarking data or reasoning that supports choosing that threshold?
Also, shouldn't this implementation include CFI directives?

I used bundles of 4 just to follow the behaviour I saw in libgcc, and the grouping of 2 for rv64 seemed a bit too fine-grained. I'm not sure what the original justification for the coalescing into groups of 2/4 was in libgcc.

I'll update to account for other suggested changes and see if I can find any benchmarks which show the tradeoff for the grouping threshold

Because the stack alignment is 16 bytes; see my earlier comment.

Ah apologies, I missed your comment. So then libgcc groups them up into the smallest groups it can whist maintaining alignment.

I could switch rv64 to use groups of 2 then if matching libgcc would be advantageous. The only reason for sharing 4 for both rv32/rv64 is that it made the implementation more compact, but that's obviously not the metric to optimize for in an emulation library.

lenary resigned from this revision.Jan 14 2021, 9:43 AM

In D91717#2411400, @edward-jones wrote:

I could switch rv64 to use groups of 2 then if matching libgcc would be advantageous. The only reason for sharing 4 for both rv32/rv64 is that it made the implementation more compact, but that's obviously not the metric to optimize for in an emulation library.

While keeping them both using a group of 4 would be reasonable (and has a neater implementation), I think overall it would be best to switch rv64 to groups of 2. If you made that change and addressed the review nits I think this would be an easy accept and merge :-)

@jrtc27 do you agree it would be worthwhile to make the change for rv64?

I've rebased, switched to a grouping of 2 for rv64 and 4 for rv32, and fixed the formatting/comments.

Herald added a subscriber: vkmr. · View Herald TranscriptMar 1 2021, 7:30 AM

Now that RV32 and RV64 have separate implementations, is there still a point in keeping the LOAD/STORE/STRIDE macros?

In D91717#2594267, @luismarques wrote:

Now that RV32 and RV64 have separate implementations, is there still a point in keeping the LOAD/STORE/STRIDE macros?

Probably not. I wasn't sure whether the macros made it more readable.

In D91717#2594301, @edward-jones wrote:

Probably not. I wasn't sure whether the macros made it more readable.

I would probably remove them now.
Also, even if we kept the macros we probably don't really need the .inc here, do we?
I suppose we also don't need the .inc for the RISC-V multiply builtins, although there it's a slightly different and more subtle situation.

Harbormaster completed remote builds in B91334: Diff 327113.Mar 1 2021, 8:10 AM

lenary removed a subscriber: lenary.Mar 1 2021, 2:56 PM

edward-jones updated this revision to Diff 330620.Mar 15 2021, 5:43 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Mar 15 2021, 5:57 AM

Harbormaster completed remote builds in B93789: Diff 330620.Mar 15 2021, 6:24 AM

Closed by commit rGb136a74efc54: [RISCV][compiler-rt] Add support for save-restore (authored by edward-jones). · Explain WhyMar 15 2021, 8:59 AM

This revision was automatically updated to reflect the committed changes.

edward-jones added a commit: rGb136a74efc54: [RISCV][compiler-rt] Add support for save-restore.

jrtc27 mentioned this in rG3c885190af21: [RISCV][compiler-rt] Add missing __riscv_save_1/0 labels for RV64.Sep 15 2021, 6:43 AM

Revision Contents

Path

Size

compiler-rt/

lib/

builtins/

CMakeLists.txt

7 lines

riscv/

restore.S

166 lines

save.S

184 lines

Diff 330680

compiler-rt/lib/builtins/CMakeLists.txt

Show First 20 Lines • Show All 621 Lines • ▼ Show 20 Lines	set(powerpc64_SOURCES
ppc/floattitf.c		ppc/floattitf.c
ppc/fixtfti.c		ppc/fixtfti.c
ppc/fixunstfti.c		ppc/fixunstfti.c
${powerpc64_SOURCES}		${powerpc64_SOURCES}
)		)
endif()		endif()
set(powerpc64le_SOURCES ${powerpc64_SOURCES})		set(powerpc64le_SOURCES ${powerpc64_SOURCES})

set(riscv_SOURCES ${GENERIC_SOURCES} ${GENERIC_TF_SOURCES})		set(riscv_SOURCES
		riscv/save.S
		riscv/restore.S
		${GENERIC_SOURCES}
		${GENERIC_TF_SOURCES}
		)
set(riscv32_SOURCES		set(riscv32_SOURCES
riscv/mulsi3.S		riscv/mulsi3.S
${riscv_SOURCES}		${riscv_SOURCES}
)		)
set(riscv64_SOURCES		set(riscv64_SOURCES
riscv/muldi3.S		riscv/muldi3.S
${riscv_SOURCES}		${riscv_SOURCES}
)		)
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/riscv/restore.S

This file was added.

				//===-- restore.S - restore up to 12 callee-save registers ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Multiple entry points depending on number of registers to restore
				//
				//===----------------------------------------------------------------------===//

				// All of the entry points are in the same section since we rely on many of
				// them falling through into each other and don't want the linker to
				// accidentally split them up, garbage collect, or reorder them.
				//
				// The entry points are grouped up into 2s for rv64 and 4s for rv32 since this
				// is the minimum grouping which will maintain the required 16-byte stack
				// alignment.

				.text

				#if __riscv_xlen == 32

				.globl __riscv_restore_12
				.type __riscv_restore_12,@function
				__riscv_restore_12:
				lw s11, 12(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_11/10/9/8

				.globl __riscv_restore_11
				.type __riscv_restore_11,@function
				.globl __riscv_restore_10
				.type __riscv_restore_10,@function
				.globl __riscv_restore_9
				.type __riscv_restore_9,@function
				.globl __riscv_restore_8
				.type __riscv_restore_8,@function
				__riscv_restore_11:
				__riscv_restore_10:
				__riscv_restore_9:
				__riscv_restore_8:
				lw s10, 0(sp)
				lw s9, 4(sp)
				lw s8, 8(sp)
				lw s7, 12(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_7/6/5/4

				.globl __riscv_restore_7
				.type __riscv_restore_7,@function
				.globl __riscv_restore_6
				.type __riscv_restore_6,@function
				.globl __riscv_restore_5
				.type __riscv_restore_5,@function
				.globl __riscv_restore_4
				.type __riscv_restore_4,@function
				__riscv_restore_7:
				__riscv_restore_6:
				__riscv_restore_5:
				__riscv_restore_4:
				lw s6, 0(sp)
				lw s5, 4(sp)
				lw s4, 8(sp)
				lw s3, 12(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_3/2/1/0

				.globl __riscv_restore_3
				.type __riscv_restore_3,@function
				.globl __riscv_restore_2
				.type __riscv_restore_2,@function
				.globl __riscv_restore_1
				.type __riscv_restore_1,@function
				.globl __riscv_restore_0
				.type __riscv_restore_0,@function
				__riscv_restore_3:
				__riscv_restore_2:
				__riscv_restore_1:
				__riscv_restore_0:
				lw s2, 0(sp)
				lw s1, 4(sp)
				lw s0, 8(sp)
				lw ra, 12(sp)
				addi sp, sp, 16
				ret

				#elif __riscv_xlen == 64

				.globl __riscv_restore_12
				.type __riscv_restore_12,@function
				__riscv_restore_12:
				ld s11, 8(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_11/10/9/8

				.globl __riscv_restore_11
				.type __riscv_restore_11,@function
				.globl __riscv_restore_10
				.type __riscv_restore_10,@function
				__riscv_restore_11:
				__riscv_restore_10:
				ld s10, 0(sp)
				ld s9, 8(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_9/8

				.globl __riscv_restore_9
				.type __riscv_restore_9,@function
				.globl __riscv_restore_8
				.type __riscv_restore_8,@function
				__riscv_restore_9:
				__riscv_restore_8:
				ld s8, 0(sp)
				ld s7, 8(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_7/6

				.globl __riscv_restore_7
				.type __riscv_restore_7,@function
				.globl __riscv_restore_6
				.type __riscv_restore_6,@function
				__riscv_restore_7:
				__riscv_restore_6:
				ld s6, 0(sp)
				ld s5, 8(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_5/4

				.globl __riscv_restore_5
				.type __riscv_restore_5,@function
				.globl __riscv_restore_4
				.type __riscv_restore_4,@function
				__riscv_restore_5:
				__riscv_restore_4:
				ld s4, 0(sp)
				ld s3, 8(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_3/2

				.globl __riscv_restore_3
				.type __riscv_restore_3,@function
				.globl __riscv_restore_2
				.type __riscv_restore_2,@function
				.globl __riscv_restore_1
				.type __riscv_restore_1,@function
				.globl __riscv_restore_0
				.type __riscv_restore_0,@function
				__riscv_restore_3:
				__riscv_restore_2:
				ld s2, 0(sp)
				ld s1, 8(sp)
				addi sp, sp, 16
				// fallthrough into __riscv_restore_1/0

				__riscv_restore_1:
				__riscv_restore_0:
				ld s0, 0(sp)
				ld ra, 8(sp)
				addi sp, sp, 16
				ret

				#else
				# error "xlen must be 32 or 64 for save-restore implementation
				#endif

compiler-rt/lib/builtins/riscv/save.S

This file was added.

				//===-- save.S - save up to 12 callee-saved registers ---------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Multiple entry points depending on number of registers to save
				//
				//===----------------------------------------------------------------------===//

				// The entry points are grouped up into 2s for rv64 and 4s for rv32 since this
				// is the minimum grouping which will maintain the required 16-byte stack
				// alignment.

				.text

				#if __riscv_xlen == 32

				.globl __riscv_save_12
				.type __riscv_save_12,@function
				__riscv_save_12:
				addi sp, sp, -64
				mv t1, zero
				sw s11, 12(sp)
				j .Lriscv_save_11_8

				.globl __riscv_save_11
				.type __riscv_save_11,@function
				.globl __riscv_save_10
				.type __riscv_save_10,@function
				.globl __riscv_save_9
				.type __riscv_save_9,@function
				.globl __riscv_save_8
				.type __riscv_save_8,@function
				__riscv_save_11:
				__riscv_save_10:
				__riscv_save_9:
				__riscv_save_8:
				addi sp, sp, -64
				li t1, 16
				.Lriscv_save_11_8:
				sw s10, 16(sp)
				sw s9, 20(sp)
				sw s8, 24(sp)
				sw s7, 28(sp)
				j .Lriscv_save_7_4

				.globl __riscv_save_7
				.type __riscv_save_7,@function
				.globl __riscv_save_6
				.type __riscv_save_6,@function
				.globl __riscv_save_5
				.type __riscv_save_5,@function
				.globl __riscv_save_4
				.type __riscv_save_4,@function
				__riscv_save_7:
				__riscv_save_6:
				__riscv_save_5:
				__riscv_save_4:
				addi sp, sp, -64
				li t1, 32
				.Lriscv_save_7_4:
				sw s6, 32(sp)
				sw s5, 36(sp)
				sw s4, 40(sp)
				sw s3, 44(sp)
				sw s2, 48(sp)
				sw s1, 52(sp)
				sw s0, 56(sp)
				sw ra, 60(sp)
				add sp, sp, t1
				jr t0

				.globl __riscv_save_3
				.type __riscv_save_3,@function
				.globl __riscv_save_2
				.type __riscv_save_2,@function
				.globl __riscv_save_1
				.type __riscv_save_1,@function
				.globl __riscv_save_0
				.type __riscv_save_0,@function
				__riscv_save_3:
				__riscv_save_2:
				__riscv_save_1:
				__riscv_save_0:
				addi sp, sp, -16
				sw s2, 0(sp)
				sw s1, 4(sp)
				sw s0, 8(sp)
				sw ra, 12(sp)
				jr t0

				#elif __riscv_xlen == 64

				.globl __riscv_save_12
				.type __riscv_save_12,@function
				__riscv_save_12:
				addi sp, sp, -112
				mv t1, zero
				sd s11, 8(sp)
				j .Lriscv_save_11_10

				.globl __riscv_save_11
				.type __riscv_save_11,@function
				.globl __riscv_save_10
				.type __riscv_save_10,@function
				__riscv_save_11:
				__riscv_save_10:
				addi sp, sp, -112
				li t1, 16
				.Lriscv_save_11_10:
				sd s10, 16(sp)
				sd s9, 24(sp)
				j .Lriscv_save_9_8

				.globl __riscv_save_9
				.type __riscv_save_9,@function
				.globl __riscv_save_8
				.type __riscv_save_8,@function
				__riscv_save_9:
				__riscv_save_8:
				addi sp, sp, -112
				li t1, 32
				.Lriscv_save_9_8:
				sd s8, 32(sp)
				sd s7, 40(sp)
				j .Lriscv_save_7_6

				.globl __riscv_save_7
				.type __riscv_save_7,@function
				.globl __riscv_save_6
				.type __riscv_save_6,@function
				__riscv_save_7:
				__riscv_save_6:
				addi sp, sp, -112
				li t1, 48
				.Lriscv_save_7_6:
				sd s6, 48(sp)
				sd s5, 56(sp)
				j .Lriscv_save_5_4

				.globl __riscv_save_5
				.type __riscv_save_5,@function
				.globl __riscv_save_4
				.type __riscv_save_4,@function
				__riscv_save_5:
				__riscv_save_4:
				addi sp, sp, -112
				li t1, 64
				.Lriscv_save_5_4:
				sd s4, 64(sp)
				sd s3, 72(sp)
				j .Lriscv_save_3_2

				.globl __riscv_save_3
				.type __riscv_save_3,@function
				.globl __riscv_save_2
				.type __riscv_save_2,@function
				__riscv_save_3:
				__riscv_save_2:
				addi sp, sp, -112
				li t1, 80
				.Lriscv_save_3_2:
				sd s2, 80(sp)
				sd s1, 88(sp)
				sd s0, 96(sp)
				sd ra, 104(sp)
				add sp, sp, t1
				jr t0

				.globl __riscv_save_1
				.type __riscv_save_1,@function
				.globl __riscv_save_0
				.type __riscv_save_0,@function
				addi sp, sp, -16
				sd s0, 0(sp)
				sd ra, 8(sp)
				jr t0

				#else
				# error "xlen must be 32 or 64 for save-restore implementation
				#endif