This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Fix offset computation for RVV
ClosedPublic

Authored by rogfer01 on Mar 17 2021, 11:54 AM.

Details

Summary

In D97111 we changed the RVV frame layout when using sp or bp to address the stack slots so we could address the emergency stack slot. The idea is to put the RVV objects as far as possible (in offset terms) from the frame reference register (sp / fp / bp).

When using fp this happens naturally because the RVV objects are already the top of the stack and due to the constraints of RVV (VLENB being a power of two >= 128) the stack remains aligned. The rest of this summary does not apply to this case.

When using sp / bp we need to skip the non-RVV stack slots. The size of the the non-RVV objects is computed subtracting the callee saved register size (whose computation is added in D97111 itself) to the total size of the stack (which does not account for RVV stack slots). However, when doing so we round to 16 bytes when computing that size and we end emitting a smaller offset that may belong to a scalar stack slot (see D98801). So this change removes that rounding.

Also, because we want the RVV objects be between the non-RVV stack slots and the callee-saved register slots, we need to make sure the RVV objects are properly aligned to 8 bytes. Adding a padding of 8 would render the stack unaligned. So when allocating space for RVV (only when we don't use fp) we need to have extra padding that preserves the stack alignment. This way we can round to 8 bytes the offset that skips the non-RVV objects and we do not misalign the whole stack in the way. In some circumstances this means that the RVV objects may have padding before (=lower offsets from sp/bp) and after (before the CSR stack slots).

Unfortunately we are computing the size of the CSR stack slots (CalleeSavedStackSize) in processFunctionBeforeFrameIndicesReplaced which is called after the prologue has been emitted. So we might be less precise here though my expectation is that this is OK because we handle those now as having XLEN size. If this is a problem, then we can unconditionally add enough padding (getStackAlign()) to retain the alignment while allowing us to adjust the area of RVV-objects as needed.

Diff Detail

Event Timeline

rogfer01 created this revision.Mar 17 2021, 11:54 AM
rogfer01 requested review of this revision.Mar 17 2021, 11:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2021, 11:54 AM
rogfer01 edited the summary of this revision. (Show Details)Mar 17 2021, 11:57 AM

Hum, this is not right. I just realized that we may need some extra padding between the end of the scalar part and the RVV stack slots. That padding hasn't been accounted anywhere.

khchen added a subscriber: khchen.Mar 18 2021, 4:08 AM
rogfer01 updated this revision to Diff 331546.Mar 18 2021, 7:01 AM
rogfer01 edited the summary of this revision. (Show Details)

ChangeLog:

  • Add extra padding so we can adjust RVV objects within the stack correctly
  • Round the offset to the RVV objects to 8 bytes (when not using fp).
  • Move computation of CalleeSavedStackSize from processFunctionBeforeFrameIndicesReplaced to processFunctionBeforeFrameFinalized because we need that size before we emit the prologue.
rogfer01 added a comment.EditedMar 18 2021, 7:04 AM

I feel this might be a bit complicated to grasp from my attempt of explanation in the summary, so this is how the stack should look like for wrong-stack-slot-rv32.mir assuming VLENB=128.

Looks reasonable to me, though I'm not the best person to judge the ins and outs of the RISC-V frame lowering. I must say, I'm loving the diagram!

Do any of our code-comment diagrams need updating to show off padding?

StephenFan added a comment.EditedMar 18 2021, 9:03 AM

Hi @rogfer01 ! It is reasonable to me. But I think the instruction of BuildMI(MBB, MBBI, DL, TII->get(RISCV::ADDI), SPReg) can be eliminated. Firstly, the value of calleeSavedStackSize can be regarded as a aligned value(For example, aligned to MFI.getStackAlign()). Then we can calculate the padding size by the aligned calleeSavedStackSize minus the original calleeSavedStackSize. When emits prologue, we can minus the value of MFI.getStackSize() - original calleeSavedStackSize + aligned calleeSavedStackSize. When get the offset of the rvv object, we can MFI.getStackSize() - original calleesavedStackSize, because we just want to calculate the non-calleesaved field size. My English is poor and I am a beginner of LLVM-RISCV. So I don't know if it makes sense to you.

Hi @rogfer01 ! It is reasonable to me. But I think the instruction of BuildMI(MBB, MBBI, DL, TII->get(RISCV::ADDI), SPReg) can be eliminated. Firstly, the value of calleeSavedStackSize can be regarded as a aligned value(For example, aligned to MFI.getStackAlign()). Then we can calculate the padding size by the aligned calleeSavedStackSize minus the original calleeSavedStackSize. When emits prologue, we can minus the value of MFI.getStackSize() - original calleeSavedStackSize + aligned calleeSavedStackSize. When get the offset of the rvv object, we can MFI.getStackSize() - original calleesavedStackSize, because we just want to calculate the non-calleesaved field size.

That sounds like a great idea, I will try your suggestion. Thanks a lot @StephenFan !

rogfer01 updated this revision to Diff 331637.Mar 18 2021, 11:27 AM
rogfer01 edited the summary of this revision. (Show Details)

ChangeLog:

  • Fold the padding into the main stack size enlargement: adjust CSR offsets accordingly, as suggested by @StephenFan
  • Add {get,set}RVVPadding to RISCVMachineFunctionInfo to materialise the extra padding required by RVV.
This comment was removed by StephenFan.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
855

If we can set the RVVPadding as

RVFI->setRVVPadding(alignTo(Size, getStackAlign().value()) -Size);

or just align to 8, because we just want the RVV stack aligned to 8 bytes.

RVFI->setRVVPadding(alignTo(Size, 8) - Size);

?
In this way, maybe we can save some stack space.

855

If we can set the RVVPadding as

RVFI->setRVVPadding(alignTo(Size, getStackAlign().value()) -Size);

or just align to 8, because we just want the RVV stack aligned to 8 bytes.

RVFI->setRVVPadding(alignTo(Size, 8) - Size);

?
In this way, maybe we can save some stack space.

rogfer01 added inline comments.Mar 19 2021, 6:45 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
855

If we can set the RVVPadding as

RVFI->setRVVPadding(alignTo(Size, getStackAlign().value()) -Size);

This is more compact, I'll use that, thanks!.

or just align to 8, because we just want the RVV stack aligned to 8 bytes.

Unless I do something different with the RVVPadding I think I need to use getStackAlign().

Consider RV32 and a single CSR: Size will be 4 so alignTo(Size, 8) - Size is 4. StackSize before adding RVVPadding is already aligned to getStackAlign() (that value is commonly 16, even in RV32). If I add 4 bytes to the StackSize then I can align the RVV to 8 bytes but sp won't be aligned to getStackAlign() anymore. Maybe I'm missing something?

StephenFan added inline comments.Mar 19 2021, 6:53 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
855

You're right. I didn't consider that if the StackSize adds some value that not aligned, it will be not aligned. And then sp will not be aligned.

StephenFan accepted this revision.Mar 19 2021, 6:56 AM
This revision is now accepted and ready to land.Mar 19 2021, 6:56 AM
rogfer01 added inline comments.Mar 22 2021, 1:45 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
855

Then I think the first form RVFI->setRVVPadding(alignTo(Size, getStackAlign().value()) -Size); would not be correct either because we might set the padding to a smaller value than getStackAlign().

I'll add a clarifying comment here.

rogfer01 updated this revision to Diff 332358.Mar 22 2021, 10:41 AM

ChangeLog:

  • Add comment explaining why we add the alignment of the stack as RVV padding when CSR are not aligned to 8 bytes.
rogfer01 retitled this revision from [RISCV][WIP] Fix offset computation for RVV to [RISCV] Fix offset computation for RVV.Mar 23 2021, 8:39 AM

I have internally validated that the last change (Diff 331637, not Diff 332358 that only adds a comment) is equivalent to the previous change (Diff 331546).

If no objections arise in the next days (and after I have addressed one comment in D98801) I'll land this.

rogfer01 updated this revision to Diff 333907.Mar 29 2021, 9:30 AM

ChangeLog:

  • Rebase
This revision was landed with ongoing or failed builds.Mar 29 2021, 10:04 AM
This revision was automatically updated to reflect the committed changes.