This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] change rvv frame layout
ClosedPublic

Authored by StephenFan on Feb 19 2021, 10:22 PM.

Details

Summary

This patch change the rvv frame layout that proposed in D94465. According to D94465, to eliminate the rvv frame index, create temp virtual register is needed. This virtual register should be scavenged by class RegsiterScavenger. If the machine function has other unused registers, there is no problem. But if there isn't unused registers, we need a emergency spill slot. Because of the emergency spill slot belongs to the scalar local variables field, to access emergency spill slot, we need a temp virtual register again. This makes the compiler report the "Incomplete scavenging after 2nd pass" error. So I change the rvv frame layout as follows:
if use sp or bp as base register:

|--------------------------------------|
|   arguments passed on the stack      |
|--------------------------------------|<--- fp
|   callee saved registers             |
|--------------------------------------|
|   rvv vector objects(local variables |
|   and outgoing arguments             |
|--------------------------------------|
|   realignment field                  |
|--------------------------------------|
|   scalar local variable(also contains|
|   emergency spill slot)              |
|--------------------------------------|<--- bp
|   variable-sized local variables     |
|--------------------------------------|<--- sp

if use fp as base register:

|--------------------------------------|
|   arguments passed on the stack      |
|--------------------------------------|<--- fp
|   callee saved registers             |
|--------------------------------------|
|   scalar local variable(also contains|
|   emergency spill slot)              |
|--------------------------------------|
|   rvv vector objects(local variables |
|   and outgoing arguments             |
|--------------------------------------|
|   variable-sized local variables     |
|--------------------------------------|<--- sp

Diff Detail

Event Timeline

StephenFan created this revision.Feb 19 2021, 10:22 PM
StephenFan requested review of this revision.Feb 19 2021, 10:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2021, 10:22 PM
StephenFan retitled this revision from change rvv frame layout to [RISCV] change rvv frame layout.Feb 19 2021, 10:25 PM
StephenFan edited the summary of this revision. (Show Details)
craig.topper added inline comments.Feb 19 2021, 10:27 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
652

This formatting looks weird.

665

Same here

StephenFan added inline comments.Feb 19 2021, 10:29 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
652

Oh, Sorry, It was caused by clang-tidy, I will fix it right now,

address @criag.topper 's comment

StephenFan marked 2 inline comments as done.Feb 19 2021, 10:42 PM

add test cases that deleted incautious

Harbormaster completed remote builds in B90029: Diff 325161.

address @criag.topper 's comment

Sorry, @craig.topper I typed your name wrong.

rogfer01 added a comment.EditedFeb 20 2021, 1:29 PM

Hi @StephenFan. I wonder if we want to do this only when we index via sp.

My understanding is that the emergency slot will be close enough when we index via fp/bp in the previous layout. In the previous layout, when indexing a scalar via sp we have to cross the RVV part (which needs an extra register) to reach the scalar registers (where the emergency slot will be located). IIUC, your approach it fixes the sp case but complicates the fp/bp case to access the emergency slot because now we need to jump over RVV.

My suggestion means we want to have the RVV vectors as far from the frame pointer as possible (the one being used sp/fp/bp), so scalars are still straightforward to access even in the presence of RVV. This means different layouts when indexing with sp and when indexing with fp/bp. In summary: your new layout for sp but the previous one for fp/bp. Does this make sense?

Hi @StephenFan. I wonder if we want to do this only when we index via sp.

My understanding is that the emergency slot will be close enough when we index via fp/bp in the previous layout. In the previous layout, when indexing a scalar via sp we have to cross the RVV part (which needs an extra register) to reach the scalar registers (where the emergency slot will be located). IIUC, your approach it fixes the sp case but complicates the fp/bp case to access the emergency slot because now we need to jump over RVV.

My suggestion means we want to have the RVV vectors as far from the frame pointer as possible (the one being used sp/fp/bp), so scalars are still straightforward to access even in the presence of RVV. This means different layouts when indexing with sp and when indexing with fp/bp. In summary: your new layout for sp but the previous one for fp/bp. Does this make sense?

Make sense to me. I didn't consider the case that needs fp to access the emergency spill slot. In my patch, It seems that make use of the bp register is necessary when variable-sized local variables and rvv stack objects exist at the same time. Because if uses fp to access the emergency spill slot needs an extra register. In the original patch, make use of the fp register is necessary when rvv stack objects exist. I prefer the current frame layout, Because it only needs a bp register when variable-sized local variables and rvv stack objects exist. Do you agree it?

Hi @StephenFan. I wonder if we want to do this only when we index via sp.

My understanding is that the emergency slot will be close enough when we index via fp/bp in the previous layout. In the previous layout, when indexing a scalar via sp we have to cross the RVV part (which needs an extra register) to reach the scalar registers (where the emergency slot will be located). IIUC, your approach it fixes the sp case but complicates the fp/bp case to access the emergency slot because now we need to jump over RVV.

My suggestion means we want to have the RVV vectors as far from the frame pointer as possible (the one being used sp/fp/bp), so scalars are still straightforward to access even in the presence of RVV. This means different layouts when indexing with sp and when indexing with fp/bp. In summary: your new layout for sp but the previous one for fp/bp. Does this make sense?

Make sense to me. I didn't consider the case that needs fp to access the emergency spill slot. In my patch, It seems that make use of the bp register is necessary when variable-sized local variables and rvv stack objects exist at the same time. Because if uses fp to access the emergency spill slot needs an extra register. In the original patch, make use of the fp register is necessary when rvv stack objects exist. I prefer the current frame layout, Because it only needs a bp register when variable-sized local variables and rvv stack objects exist. Do you agree it?

Sorry, the last sentence is grammatically wrong. What I mean is " what about your opinion?"

Hi @StephenFan. I wonder if we want to do this only when we index via sp.

My understanding is that the emergency slot will be close enough when we index via fp/bp in the previous layout. In the previous layout, when indexing a scalar via sp we have to cross the RVV part (which needs an extra register) to reach the scalar registers (where the emergency slot will be located). IIUC, your approach it fixes the sp case but complicates the fp/bp case to access the emergency slot because now we need to jump over RVV.

My suggestion means we want to have the RVV vectors as far from the frame pointer as possible (the one being used sp/fp/bp), so scalars are still straightforward to access even in the presence of RVV. This means different layouts when indexing with sp and when indexing with fp/bp. In summary: your new layout for sp but the previous one for fp/bp. Does this make sense?

Make sense to me. I didn't consider the case that needs fp to access the emergency spill slot. In my patch, It seems that make use of the bp register is necessary when variable-sized local variables and rvv stack objects exist at the same time. Because if uses fp to access the emergency spill slot needs an extra register. In the original patch, make use of the fp register is necessary when rvv stack objects exist. I prefer the current frame layout, Because it only needs a bp register when variable-sized local variables and rvv stack objects exist. Do you agree it?

When it is necessary to have fp, the frame layout will look like as below.

|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| ///// realignment /////         |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------|
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------| <- end of frame (sp)

What do you mean "if uses fp to access the emergency spill slot needs an extra register"?

I agree with Roger. We only need to change the frame layout when accessing frame objects only using sp. That is, we should put RVV objects as far as 'base' as possible. The 'base' could be sp, fp or bp.

Hi @StephenFan. I wonder if we want to do this only when we index via sp.

My understanding is that the emergency slot will be close enough when we index via fp/bp in the previous layout. In the previous layout, when indexing a scalar via sp we have to cross the RVV part (which needs an extra register) to reach the scalar registers (where the emergency slot will be located). IIUC, your approach it fixes the sp case but complicates the fp/bp case to access the emergency slot because now we need to jump over RVV.

My suggestion means we want to have the RVV vectors as far from the frame pointer as possible (the one being used sp/fp/bp), so scalars are still straightforward to access even in the presence of RVV. This means different layouts when indexing with sp and when indexing with fp/bp. In summary: your new layout for sp but the previous one for fp/bp. Does this make sense?

Make sense to me. I didn't consider the case that needs fp to access the emergency spill slot. In my patch, It seems that make use of the bp register is necessary when variable-sized local variables and rvv stack objects exist at the same time. Because if uses fp to access the emergency spill slot needs an extra register. In the original patch, make use of the fp register is necessary when rvv stack objects exist. I prefer the current frame layout, Because it only needs a bp register when variable-sized local variables and rvv stack objects exist. Do you agree it?

When it is necessary to have fp, the frame layout will look like as below.

|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| ///// realignment /////         |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------|
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------| <- end of frame (sp)

What do you mean "if uses fp to access the emergency spill slot needs an extra register"?

I agree with Roger. We only need to change the frame layout when accessing frame objects only using sp. That is, we should put RVV objects as far as 'base' as possible. The 'base' could be sp, fp or bp.

What do you mean "if uses fp to access the emergency spill slot needs an extra register"?
Answer: In my patch, an extra register is needed when uses fp to access the emergency spill slot.

According to what @HsiangKai said, I realize that I misunderstood what @rogfer01 said before. Now, I also agree with @rogfer01 . And I will fix this patch.

address @HsiangKai and @rogfer01 's comments

StephenFan edited the summary of this revision. (Show Details)Feb 22 2021, 3:58 AM
rogfer01 added inline comments.Feb 23 2021, 3:18 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
652

In the drawing, can you put realignment between callee-saved registers and RVV objects?

Once we realign sp / bp as needed, we lay out the stack as if nothing happened, so the realignment if any, would act as padding below the RVV objects. As it stands now, it looks like the padding is between the scalars and the RVV objects, but this should not be the case given that the realignment is an unknown magnitude (hence the need to keep fp around)

I understand this means that MFI.getStackSize() does not account the realignment (it can't as it is unknown), perhaps we could state this somehow in the drawings? What do you think?

665

Ditto: realignment goes between callee-saved registers and RVV objects.

jrtc27 added inline comments.Feb 23 2021, 9:30 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
806–807
831
837
llvm/test/CodeGen/RISCV/rvv/allocate-lmul-2-4-8.ll
8–13

What's up with all the whitespace changes? Those should have been normalised.

Address @rogfer01 's comments.

StephenFan marked 5 inline comments as done.Feb 24 2021, 5:20 AM
StephenFan added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
652

I Agree with you.

StephenFan marked an inline comment as done.

Address @jrtc27 's comments

delete unnecessary white space

rogfer01 added a comment.EditedMar 1 2021, 12:36 AM

Can you add some tests for the cases with overaligned stuff and variable sized + rvv?

Something like this (feel free to improve if needed)

define void @rvv_vla(i64 %n, i64 %i) nounwind {
  %vla.addr = alloca i32, i64 %n

  %v1.addr = alloca <vscale x 1 x i64>
  %v1 = load volatile <vscale x 1 x i64>, <vscale x 1 x i64>* %v1.addr

  %v2.addr = alloca <vscale x 2 x i64>
  %v2 = load volatile <vscale x 2 x i64>, <vscale x 2 x i64>* %v2.addr

  %p = getelementptr i32, i32* %vla.addr, i64 %i
  %s = load volatile i32, i32* %p
  ret void
}

define void @rvv_overaligned(i64 %n, i64 %i) nounwind {
  %overaligned = alloca i32, align 64

  %v1.addr = alloca <vscale x 1 x i64>
  %v1 = load volatile <vscale x 1 x i64>, <vscale x 1 x i64>* %v1.addr

  %v2.addr = alloca <vscale x 2 x i64>
  %v2 = load volatile <vscale x 2 x i64>, <vscale x 2 x i64>* %v2.addr

  %s = load volatile i32, i32* %overaligned, align 64
  ret void
}

and a combined version of both.

StephenFan updated this revision to Diff 327749.Mar 3 2021, 5:02 AM

Address @rogfer01 's comment. Add rvv-framelayout.ll test.

StephenFan updated this revision to Diff 327752.Mar 3 2021, 5:04 AM

Delete white spaces.

rogfer01 accepted this revision.Mar 9 2021, 12:56 AM

Except for the minor comments above, LGTM.

llvm/test/CodeGen/RISCV/rvv/rvv-framelayout.ll
50 ↗(On Diff #327752)

This test doesn't seem to be using %n or %i. My fault: I accidentally copy-pasted it in my earlier suggestion.

87 ↗(On Diff #327752)

typo: rvv_valrvv_vla

This revision is now accepted and ready to land.Mar 9 2021, 12:56 AM

Address @rogfer01 's comment.

This revision was landed with ongoing or failed builds.Mar 13 2021, 12:11 AM
This revision was automatically updated to reflect the committed changes.