Page MenuHomePhabricator

[RISCV] Support vector types in combination with fastcc

Authored by frasercrmck on May 14 2021, 9:04 AM.



This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.

Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.

Diff Detail

Event Timeline

frasercrmck created this revision.May 14 2021, 9:04 AM
frasercrmck requested review of this revision.May 14 2021, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2021, 9:04 AM
  • rebase
  • fix zero-alignment for small mask vectors
HsiangKai added inline comments.Sat, May 22, 8:50 AM

Should we allow all the vector registers used to pass vector arguments under fastcc, instead of limiting to v8 to v23?

frasercrmck added inline comments.Mon, May 24, 3:38 AM

Sure, that's a possibility. I think it opens up some interesting new situations if we combine that with the first mask going to v0, like (with max-lmul=8):

define fastcc <4 x i1> @foo(<32 x i32> %x, <8 x i32> %y, <32 x i32> %z, <32 x i32> %w, <4 x i1> %m1, <4 x i1> %m2, <4 x i1> %m3) {
; %x -> $v8m8
; %y -> $v2m2
; %z -> $v16m8
; %w -> $v24m8
; %m1 -> $v0
; %m2 -> $v1
; %m3 -> $v4

Do you think that's worth it? It's slightly harder to reason about (finding which operand goes to which register takes a bit of back-and-forth) but the allocation is certainly improved. I suppose that's the goal of fastcc.

frasercrmck added inline comments.Mon, May 24, 5:47 AM

Now I'm seeing a surprising register allocation failure when I allocate all registers to function arguments.

define fastcc <vscale x 32 x i32> @foo(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {

Gives me the following lowering code, with %x -> v0m8/v8m8, %y -> v16m8/v24m8 and %z -> indirect(x10) and indirect(x12):

bb.0 (%ir-block.0):
  liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
  %5:gpr = COPY $x12
  %4:gpr = COPY $x10
  %3:vrm8 = COPY $v24m8
  %2:vrm8 = COPY $v16m8
  %1:vrm8 = COPY $v8m8
  %0:vrm8 = COPY $v0m8
  %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
  %7:gpr = ADDI %4:gpr, 64
  %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)

Then the machine scheduler decides to do this:

0B	bb.0 (%ir-block.0):
	  liveins: $v0m8, $v8m8, $v16m8, $v24m8, $x10, $x12
32B	  %4:gpr = COPY $x10
128B	  %7:gpr = ADDI %4:gpr, 64
136B	  %6:vrm8 = VL8RE32_V %4:gpr :: (load unknown-size, align 64)
144B	  %8:vrm8 = VL8RE32_V %7:gpr :: (load unknown-size, align 64)
152B	  %5:gpr = COPY $x12
160B	  %3:vrm8 = COPY $v24m8
168B	  %2:vrm8 = COPY $v16m8
176B	  %1:vrm8 = COPY $v8m8
184B	  %0:vrm8 = COPY $v0m8

And then register allocator is unable to allocate for %6 or %8 since all physical registers are occupied. I'm surprised the scheduler made that change.

I'm not the most familiar with other targets and their calling conventions, but perhaps this isn't supported. A compromise could be to leave v24m8 free but continue to use v1-v7, which is still more than the base calling convention supports.

Any thoughts?

HsiangKai added inline comments.Wed, May 26, 1:09 AM

I am curious why the register allocator does not spill the vector registers?

By the way, it is strange to add 64 to the base address($x10) for the arguments. It should add 8 x vlenb, right? I know it is another issue not related to the patch. We could prepare another patch to fix it.

frasercrmck added inline comments.Thu, May 27, 7:55 AM

Are you asking why it doesn't spill one of the liveins so it can move %6 into it? I'm not sure. I'm not an expert on the register allocator in any way, but I have seen issues with calling conventions + regalloc in the past. I wonder if it isn't able to spill physical livein registers.

Yes, it's a bit suspicious that it loads from a fixed offset from the base address. Probably not caused by this patch but I'll look into that.

frasercrmck added inline comments.Thu, May 27, 9:12 AM

Please see D103262 for a fix for that offset issue.

HsiangKai added inline comments.Sun, May 30, 7:30 AM

Got it. I think we could use v8 to v23 for fastcc first. If we want to use more vector registers for fastcc, we could refine it later. It has no impact on the standard calling convention.

You need to rebase on D103262.

frasercrmck marked an inline comment as done.
  • use MaybeAlign over std::max
frasercrmck added inline comments.Mon, May 31, 4:15 AM

Yeah I think that'd work. Despite the potential register-allocation issues, it's certainly not trivially better to use all registers in fastcc. I saw several slowdowns in testing when I used v0-v23 compared with v8-v23.

This revision is now accepted and ready to land.Tue, Jun 1, 12:28 AM
  • rebase to see how the buildbots get on
This revision was landed with ongoing or failed builds.Tue, Jun 1, 2:39 AM
This revision was automatically updated to reflect the committed changes.