This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Frame handling for RISC-V V extension. (2nd. version)
ClosedPublic

Authored by HsiangKai on Jan 11 2021, 9:05 PM.

Details

Summary

This patch proposes how to deal with RISC-V vector frame objects. The
layout of RISC-V vector frame will look like

|---------------------------------|
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------|
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------| <- end of frame (sp)

If there is realignment or variable length array in the stack, we will use
frame pointer to access fixed objects and stack pointer to access
non-fixed objects.

|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| ///// realignment /////         |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------|
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------| <- end of frame (sp)

If there are both realignment and variable length array in the stack, we
will use frame pointer to access fixed objects and base pointer to access
non-fixed objects.

|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| ///// realignment /////         |
|---------------------------------| <- base pointer (bp)
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------|
| /////////////////////////////// |
| variable length array           |
| /////////////////////////////// |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------| <- end of frame (sp)

In this version, we do not save the addresses of RVV objects in the
stack. We access them directly through the polynomial expression
(a x VLENB + b). We do not reserve frame pointer when there is any RVV
object in the stack. So, we also access the scalar frame objects through the
polynomial expression (a x VLENB + b) if the access across RVV stack
area.

Diff Detail

Event Timeline

HsiangKai created this revision.Jan 11 2021, 9:05 PM
HsiangKai requested review of this revision.Jan 11 2021, 9:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 9:05 PM
Herald added a subscriber: MaskRay. · View Herald Transcript
jrtc27 added inline comments.Jan 11 2021, 11:16 PM
llvm/include/llvm/CodeGen/MIRYamlMapping.h
350–351 ↗(On Diff #315988)

The handling of RISCVVector is always the same as SVEVector when in generic code. Can we not merge the two as they're the same thing just for different architectures?

HsiangKai edited the summary of this revision. (Show Details)Jan 11 2021, 11:46 PM

I prefer this approach. Computing the address is a bit more involved but doesn't need an extra memory access.

It looks to me less hardware machinery gets involved (in practice) to reach the stack slot.

Address @jrtc27's comments.

I also think one direct memory access is better than two memory accesses. I abandoned D93750 first. If there is anyone oppose it, I could reactivate it.

craig.topper added inline comments.Jan 21 2021, 3:36 PM
llvm/include/llvm/CodeGen/TargetFrameLowering.h
27 ↗(On Diff #316011)

Is this just a formatting change?

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
162

Is Amount expected to be a multiple of 8 or does the division need to take the ceiling?

craig.topper added inline comments.Jan 21 2021, 3:36 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
167

Is Amount and NumOfVReg always positive? Should we check or assert that NumOfVReg fits in 32 bits before using isPowerOf2_32?

171

Drop this else, the if already returned.

180

Assert that NumOfVReg fits in 12 bits?

192

Is this if the same condition as the one at line 208? If so, how can this be true if we already signalled a fatal_error above?

HsiangKai added inline comments.Jan 21 2021, 6:52 PM
llvm/include/llvm/CodeGen/TargetFrameLowering.h
27 ↗(On Diff #316011)

I will revert it.

HsiangKai added inline comments.Jan 21 2021, 7:25 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
162

Amount is expected to be a multiple of 8. If the object size is less than 8, it will be set to 8 at assignRVVStackObjectOffsets() in RISCVFrameLowering.cpp.

167

Amount and NumOfVReg are always positive. I will add assertions for the condition.

HsiangKai updated this revision to Diff 318399.Jan 21 2021, 7:30 PM
  • Address @craig.topper's comments.
  • Update the test cases to use v8-v23 as arguments.
HsiangKai marked 5 inline comments as done.Jan 21 2021, 7:31 PM
frasercrmck added inline comments.Jan 22 2021, 9:40 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
393

nit: It's a bit odd to have one == 0 comparison and one ! comparison in the same line. I'd favour consistency here.

623

else ...? I think we should probably assert here.

783

This is implicitly converting from int to unsigned. Is using int okay?

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
154

This parameter named Amount implies the function can be used generically, but the variables and assertions within specifically imply the amount is taken to be "the number of registers". If this function is truly generic I think things like NumOfVReg should be renamed, but if the function makes assumptions about what Amount is then that should be clarified.

Also, if the function is generic maybe it should live in RISCVInstrInfo

HsiangKai added inline comments.Jan 22 2021, 4:51 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
154

Amount could be the stack size occupied by vector objects or the offset of the vector objects. I think NumOfVReg is appropriate because the stack size or offset are equivalent to the number of vector times vector size(VLENB).

I will move this function to RISCVInstrInfo.

HsiangKai updated this revision to Diff 318697.Jan 22 2021, 5:07 PM

Address @frasercrmck's comments.

HsiangKai updated this revision to Diff 318728.Jan 22 2021, 8:55 PM

Update test cases for vector argument passing by reference.

HsiangKai updated this revision to Diff 318752.Jan 23 2021, 6:14 AM
rogfer01 added inline comments.Jan 24 2021, 2:29 AM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
209

I'm not sure whether we won't get some cases where the frame index is fixed but the instruction is actually a RVV load/store.

Something from IR like below (assume the size of the array is correct for the runtime vscale value)

%array = alloca [64 x double]
%vptr = bitcast [64 x double]* %array to <vscale x 1 x double>*
%v = load <vscale x 1 x double>, <vscale x 1 x double>* %vptr
HsiangKai added inline comments.Jan 24 2021, 3:14 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
209

We may need to add an option to tell the compiler what vector length we are compiling for. Otherwise, how do we confirm the VLS length equals to RVV vector length? Another question is which LMUL do we support to do VLS load/store through RVV instructions, LMUL = 1 only?

I will leave a FIXME here.

HsiangKai updated this revision to Diff 318872.Jan 24 2021, 3:23 PM

Address @rogfer01's comments.

HsiangKai updated this revision to Diff 318900.Jan 24 2021, 9:47 PM

Assuming the alloca is at least as large as the size of the load implied by vlmax, does anything break? Vscale vectorized code for code using a stack array could generate code like Roger's example. I think a runtime check would be emitted to make sure the array (really the bounds of the loop accessing the array) is at least as large vlmax.

rogfer01 added inline comments.Jan 25 2021, 1:16 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
209

I'm not sure I'm reading the condition right:

if (!((RISCVVPseudosTable::getPseudoInfo(MI.getOpcode()) != nullptr) && MI.mayLoadOrStore()))

should be the same as

if (((RISCVVPseudosTable::getPseudoInfo(MI.getOpcode()) == nullptr) || !MI.mayLoadOrStore()))

or slightly simplified

if (!RISCVVPseudosTable::getPseudoInfo(MI.getOpcode()) || !MI.mayLoadOrStore()))

So my question is: what happens if !RISCVVPseudosTable::getPseudoInfo(MI.getOpcode()) is false (i.e. this is a RVV pseudo) and !MI.mayLoadOrStore() is false too (say it is RISCV::PseudoVLE64_V)?

I understand the LLVM IR snippet above can generate a case like this one: fixed offset but an RVV pseudo that does loads/store.

We can address this case in a later change. If we do, I suggest we add a report_fatal_error to clearly state this is not supported yet.

What do you think?

jrtc27 added inline comments.Jan 25 2021, 1:39 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
311

Currently it reads as an absolute statement regardless of Amount.

556

Caller can do MF.getFrameInfo().getCalleeSaveInfo() now, no need to pass it.

556

Eh, I guess sometimes it's already lying around, and there's a lot of redundancy in some of the arguments passed around elsewhere.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
913

Does V require M? V without M would be a stupid combination, but if it's legal we shouldn't assume otherwise.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
209

Your simplified form is much easier to parse; I'd prefer the patch use that instead.

221

This should never happen given we're in the else for if (!Offset.getScalable())

237

This step doesn't get a number/summary?

llvm/test/CodeGen/RISCV/rvv/allocate-lmul-2-4-8.ll
10

This should only be read once per function, especially since CSR accesses can be full pipeline flushes on some microarchitectures (though one would hope that something like vlenb wouldn't, being a constant CSR, in a high performance chip, but less code is still good). Though you'd still likely want it to be rematerialisable rather than spilled to the stack.

llvm/test/CodeGen/RISCV/rvv/localvar.ll
123

Ugh what ugly syntax they chose, should have gone with | or something (or at least put the whole thing in parentheses if they really want to use commas).

HsiangKai updated this revision to Diff 319166.Jan 25 2021, 4:41 PM
HsiangKai marked 5 inline comments as done.Jan 25 2021, 4:43 PM
HsiangKai added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
556

I prefer to keep it as so in this patch.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
237

I didn't understand this comment.

craig.topper added inline comments.Jan 25 2021, 4:47 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
237

There are comments like "1. Get vlenb..." and "2. Calculate address..." and "3. Replace address..." but this step didn't get a similar numbered comment.

HsiangKai added inline comments.Jan 25 2021, 7:08 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
237

Got it. I will refine it. Thanks.

HsiangKai updated this revision to Diff 319197.Jan 25 2021, 8:30 PM

Address @craig.topper and @rogfer01's comments.

HsiangKai marked 6 inline comments as done.Jan 25 2021, 8:32 PM
HsiangKai updated this revision to Diff 319203.Jan 25 2021, 9:08 PM
rogfer01 added inline comments.Jan 25 2021, 11:17 PM
llvm/test/CodeGen/RISCV/rvv/access-fixed-objects-by-rvv.ll
5 ↗(On Diff #319212)

I think you want nounwind here and in the function definition below.

rogfer01 added inline comments.Jan 25 2021, 11:19 PM
llvm/test/CodeGen/RISCV/rvv/access-fixed-objects-by-rvv.ll
5 ↗(On Diff #319212)

Oh nevermind, I was confused by the fact some of the other tests had nounwind.

Fix failed cases.

craig.topper added inline comments.Jan 26 2021, 5:01 PM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
237

Should we add a kill flag here if FrameRegIsKill is true?

Address @craig.topper's comments.

HsiangKai marked an inline comment as done.Jan 26 2021, 11:02 PM
craig.topper added inline comments.Jan 27 2021, 10:15 AM
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
215

I think this also need getKillRegState(FrameRegIsKill)

237

Use getKillRegState(FrameRegIsKill)

HsiangKai updated this revision to Diff 319684.Jan 27 2021, 2:51 PM
HsiangKai marked 2 inline comments as done.

Ping. Is there anything I need to address or I missed?

rogfer01 accepted this revision.Feb 16 2021, 1:42 PM

I'm happy with the patch as it stands now. At least it gives us a baseline to improve if that is needed.

Thanks @HsiangKai. LGTM.

This revision is now accepted and ready to land.Feb 16 2021, 1:42 PM
This revision was landed with ongoing or failed builds.Feb 16 2021, 10:37 PM
This revision was automatically updated to reflect the committed changes.
SuH added a subscriber: SuH.Mar 29 2021, 8:24 PM

I have a question about access incoming args.

|---------------------------------|
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------|
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------| <- end of frame (sp)

In this frame layout design, if we have some args pass through stack(more than 8 scalar arguments or more than 32 vector registers passing from caller),
how to access incoming arguments frame objects across "RVV local variables" ?
(RVV local variables is currecnt frame and incoming arguments is previous frame.)

rogfer01 added a comment.EditedMar 30 2021, 2:26 AM

Hi @SuH,

I have a question about access incoming args.

We changed the RVV frame layout in D97111, does your question still hold under that new layout?

Kind regards,

SuH added a comment.Mar 30 2021, 10:21 AM

Hi @rogfer01

We changed the RVV frame layout in D97111, does your question still hold under that new layout?

Yes, the problem still exist.

|---------------------------------|------
| scalar incoming arguments       |      |
|---------------------------------|      |  Previous frame
| RVV previous local variables && |      | <-----how to know this size or how to count count # of vectors?
| RVV incoming arguments          |------
|---------------------------------| - start of frame (fp)
| scalar callee-saved registers   |
|---------------------------------|
| scalar local variables          |
|---------------------------------|
| scalar outgoing arguments       |
|---------------------------------|
| RVV local variables &&          |
| RVV outgoing arguments          |
|---------------------------------| <- end of frame (sp)

My question is how to count the scalar incoming argument with this frame layout,
because scalar incoming arguments cross RVV previous local variables frame objects.
In the current frame(callee), we don't know the RVV previous local variables count.
So we can't get the real stack offset because callee not know the caller vector local variables size.

Hi @SuH,

My question is how to count the scalar incoming argument with this frame layout,
because scalar incoming arguments cross RVV previous local variables frame objects.
In the current frame(callee), we don't know the RVV previous local variables count.
So we can't get the real stack offset because callee not know the caller vector local variables size.

My understanding is that the FrameIndexes associated to the RVV incoming arguments will have positive offsets in the MachineFrameInfo before we start emitting the frame layout (local variables should have negative offsets). FrameIndexes for RVV objects should have a different StackID too.

That said, I might be mixing things here so if you have an example showing the issue, this would help a lot.